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A  CLASS  OF  ALGORITHMS  FOR  AUTOMATIC  EVALUATION  OF 
CERTAIN  ELEMENTARY  FUNCTIONS  IN  A  BINARY  COMPUTER 
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The  time  required  to  evaluate  elementary  transcendental  functions 
in  a  digital  computer  can  often  be  significantly  reduced  by  performing  the 
algorithms  in  hardware  rather  than  software  form,  providing  that  efficient 

r 

hardware  algorithms  can  be  developed.  Such  time  reduction  may  be  important 
in  batch  processing  on  a  large  machine  when  the  mix  of  problems  includes 
frequent  evaluation  of  such  functions  and  may  be  essential  in  a  special 
purpose  real-time  machine  such  as  the  guidance  system  of  an  airborne  vehicle 
where  an  increase  in  speed  is  justified  at  almost  any  cost. 

In  situations  which  warrant  the  increase  in  hardware  investment, 
primarily  in  control  complexity,  necessary  to  implement  these  algorithms, 
schemes  which  utilize  redundant  number  recodings  are  also  quite  well  justi- 
fied. Ordinarily  the  introduction  of  redundancy,  that  is,  allowing  more  than 
r  digital  values  in  radix  r  so  that  the  representation  of  numbers  is  no 
longer  unique,  provides  an  increase  in  speed  in  exchange  for  some  increase 
in  hardware  complexity.  Usually  the  increase  in  hardware  investment  is 
moderate  relative  to  the  cost  of  the  machine. 

Algorithms  to  evaluate  common  elementary  functions  to  register 
length  precision  in  from  one  to  three  "multiplication  cycle  times"  are 
developed;  a  "multiplication  cycle  time"  is  defined  as  the  time  required  to 
perform,  on  the  average,  M  low  precision  comparisons,  M  shifting  operations, 
and  M/3  additions  where  M  is  the  length  of  the  mantissa  of  a  floating-point 
operand.  The  table  below  summarizes  the  results.  A  small  register-speed  read- 
only-memory containing  less  than  100  precomputed  constants  is  required. 
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TABLE 


Function 

Division 
Logarithm 
Square  Root 
Exponential 
Tangent 
Cosine/Sine 
Arctangent 
Arccosine/  Arcsine 


Multiplication 
Cycle  Times 

1 
1 
1 

3 
2 
2 
1 
3 


It  may  be  possible  to  extend  the  techniques  employed  in  the  development  of 
these  algorithms  to  a  much  wider  class  of  functions. 

Algorithms  which  are  readily  amenable  to  implementation  in  radix 
r  =  2  ,   n  >  1,  promise  even  greater  increases  in  speed  than  those  limited  to 
radix  2  implementation.  Of  the  algorithms  listed  in  the  table,  only  the  in- 
verse trigonometric  fail  to  fall  in  this  category. 


1.   INTRODUCTION 

1.1  Justification  of  the  research  effort 

The  continuing  reduction  of  the  size  and  cost  and  the  increasing 
reliability  of  integrated  circuit  logic  elements,  especially  LSI,  encourages 
the  utilization  of  more  complex  logic  networks  in  the  arithmetic  unit  of  a 
digital  computer.  Furthermore,  since  in  most  large  modern  machines  the  cost 
of  the  arithmetic  unit  is  only  a  small  portion  of  the  cost  of  the  entire 
machine,  it  is  possible  to  substantially  increase  the  percentage  cost  of  the 
arithmetic  unit  without  appreciably  affecting  its  relative  cost.   In  arith- 
metic units  these  cost  and  size  reductions  encourage  the  replacement  of  very 
frequently  used  programmed  subroutines  by  built-in  function  generators. 
Division  is  now  almost  universally  hard-wired;  algorithms  for  hard-wired 
square  root  have  often  been  proposed.   In  future  machines,  especially  those 
intended  to  perform  substantial  numbers  of  scientific  calculations,  it  may 
well  be  feasible  to  hard-wire  algorithms  to  evaluate  logarithm,  exponential, 
cosine/sine,  tangent,  and  arctangent  as  well  as  square  root.  With  these 
thoughts  in  mind,  an  investigation  leading  to  the  development  of  such 
algorithms  was  undertaken. 

1.2  Redundancy  and  a  measure  of  efficiency 

The  introduction  of  redundancy,  for  example  allowing  digital  values 
1,  0,  1  (where  1  means  -l)  in  binary,  allows,  in  general,  several  different 
representations  of  any  particular  number.  Thus  one  can  choose  a  recoding 
procedure  which  increases  the  probability  of  a  zero,  pn,  which  increases  the 
shift  average  <S>  =  l/(l-p  ). 

Making  the  usual  assumption  that  0  and  1  are  equally  likely,  one 
can  see  that  for  conventionally  represented  numbers,  p0  =  l/2  and  <S>  =  2. 


It  is  known  that  by  introducing  minimal  redundancy  (three  values  1,  0,  1  in 
radix  2)  one  can  increase  p  to  2/3  and  <S>  to  3.   It  is  also  known  that  one 
cannot  achieve  a  higher  shift  average  with  minimal  redundancy.  The  feasibil- 
ity of  further  increasing  redundancy  in  the  algorithms  developed  in  this 
research  has  not  been  studied  because  it  is  generally  felt  that  the  speed- 
hardware  ratio  diminishes  rapidly. 

1.3  Previous  accomplishments 

Very  considerable  research  has  been  and  is  being  directed  toward 
formulation  of  fast  division  schemes,  including,  for  example,  the  SRT  (and 
scaled  SRT  )  division  algorithm  which  employs  redundant  representation  of 
output  digits  to  achieve  greater  speed.   Several  algorithms  have  also  been 
developed  for  evaluation  of  various  other  elementary  functions;  included  are 
the  works  of  Meggitt,   Specker,   Senzig,   and  Voider.   Most  of  the  algorithms 
devised  to  date  employ  non-redundant  number  representation,  although  often 
digital  values  1,  1  are  used  rather  than  0,  1.  The  same  factors  which 
encourage  hard-wiring  of  algorithms  for  evaluation  of  elementary  functions 
also  encourage  the  use  of  redundant  number  recodings  which  require  more  hard- 
ware in  exchange  for  greater  speed. 

The  analysis  of  the  SRT  division  and  a  study  of  the  correspondence 
between  that  division  and  multiplier  recoding  procedures  carried  out  by 
Robertson,   '   Penhollow,  Frieman,   Shively,   and  others  lead  to  the 
conjecture  by  the  author  that  redundancy  techniques  could  be  favorably 

employed  in  achieving  greater  efficiency  in  algorithms  beyond  division  and 

12 
square  rooting.    That  that  conjecture  holds  some  validity,  at  least  under 

the  assumptions  discussed  below,  is  verified  in  this  paper. 
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l.k     Assumptions 

It  is  assumed  throughout  this  paper  that  the  time  required  to 
perform  addition  is  significantly  greater  than  the  time  necessary  to  perform 
low  precision  comparisons,  shifting  operations,  or  complementations.   It  is 
further  assumed  that,  with  present  hardware  technology,  one  can  economically 
build  small  (less  than  100  words)  read  only  memories  which  can  be  accessed 
at  register  speed.   It  is  strongly  felt  that  these  assumptions  are  justified 
by  presently  available  hardware. 

1.5  Procedure 

The  fundamental  technique  employed  throughout  this  research  is  the 
"normalization"  of  a  given  operand  (or  a  simple  function  of  that  operand)  by 
means  of  a  continued  product  or  continued  summation,  the  terms  of  which  may 
be  easily  chosen  in  a  step-by-step  process.  The  procedure  for  formulating 
an  algorithm  consists  of  the  following  operations:   (l)   choice  of  an  appro- 
priate function  of  a  given  operand  to  be  "normalized";  (2)  approximation  of 
that  function  by  a  continued  product  or  continued  sum  whose  terms  are  easily 
chosen;  (3)  utilization  of  the  individual  terms  comprising  that  product  or 
sum,  as  they  are  chosen,  to  form  the  desired  function;  (k)      choice  of  an 
appropriate  rule  for  selecting  the  individual  terms;  (5)   scaling  of  the 
operand  upon  which  the  selection  rule  is  based  so  that  the  rule  is  the  same 
for  every  step  in  the  recursion.  The  procedure  is  best  illustrated  by  the 
following  example. 

One  scheme  for  performing  division  is  to  form  the  reciprocal  of  the 
divisor  as  a  continued  product  of  simple  "multiplier"  constants,  using  each 
of  the  "multipliers",  as  it  is  chosen,  to  form  a  better  approximation  to  the 
quotient;  viz. 


1* 

q,     ..  =  q_  d  ,  q  =  dividend 
k+1    k  k'   0 


where 


-■Id.  ->0. 
X  i=0  x 

In  this  particular  scheme,  the  quantity 

1   k 

--Ed 

x  i=o  x 

k 
is   "normalized"    to   zero  implicitly  by  explicitly  forcing  x  IT  d,-    to  unity. 

1=0     k 

One  must  formulate  a  selection  rule,  based  on  the  quantity  1  -  x  H  dn- ,  to 

i=0  1' 

choose  the  next  multiplier.   It  is  convenient  to  scale  the  partially  normal- 
ized quantity,  upon  which  the  selection  rule  is  based,  in  such  a  way  that 
the  selection  rule  is  the  same  for  every  step  of  the  recursion.  A  convenient 
scaling  here  is 


u  =  2  (1  -  x  n  d  ) 
*  i=0  X 


with  the  resulting  recursion 


Vi  =  2\  +  \  +  2    sk\ 


where 


dk 


1  +  s  2 
k 


-k 


■  ( 


1  if  u  <  -  c 
k 

0  otherwise 

1  if  u  >  +  c 


for  some  comparison  constant  c.  The  efficiency  of  the  algorithm  is  strongly 
dependent  on  the  form  of  the  recursion  for  two  reasons:   first,  the  more 
complicated  the  recursion  the  more  time  and/ or  hardware  required  to  implement 
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it;  second,  the  form  of  the  recursion  determines  the  difficulty  of  formulating 
a  higher  radix  implementation  which  could  lead  to  even  greater  speed. 

This  fundamental  technique,  termed  the  "normalization"  technique 
in  this  paper,  is  discussed  in  detail  in  Section  2.   In  Sections  3  through  12 
the  technique  is  applied  to  various  elementary  functions — division,  square 
rooting,  logarithm,  exponential,  the  circular  functions,  and  their  inverses. 
For  each  algorithm  developed,  an  error  bound  and  some  methods  of  implementa- 
tion are  discussed.  Finally,  in  Section  13,  some  considerations  of  importance 
in  a  higher  radix  implementation  are  discussed. 

1.6  Results 

With  three  adder  circuit  configurations  of  the  type  discussed  in 
this  paper,  one  can  perform  the  algorithms  listed  below  in  the  approximate 
average  amounts  of  time  indicated.  A  "multiplication  cycle  time"  is  the  time 
required  to  perform  M  low  precision  (3  or  k   bits)  comparisons,  M  shifting 
operations,  and  M/3  additions/subtractions,  where  M  is  the  length  of  the 
mantissa. 


TABLE  1 

Function 

Range  of  Operand  Allowed 

Mult 
Cyc 

iplication 
le  Times 

Division 

Machine  Range 

1 

Logarithm 

X  >  0 

1 

Square  Root 

X  >  0 

1 

Exponential 

Machine  Range 

3 

Tangent 

0  <  X  <  2n 

2 

Cosine/Sine 

0  <  X  <  2n 

2 

Arctangent 

Machine  Range 

1 

Arccosine/Arcsine 

0  <  X  <  1 

3 

Worst  case  error  bounds  are  developed  for  each  algorithm;  typically 
worst  case  errors  are  less  than  one  part  in  2 
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2.   THE  NORMALIZATION  TECHNIQUE 

2.1  Normalization  schemes  for  division 
2.1.1  Basic  principle  of  normalization 

Most  of  the  division  and  square  root  algorithms  known  to  the 
author  appear  to  be  derived  on  the  basis  of  a  hand -computation  technique. 
Although  some  of  the  algorithms  derived  from  the  normalization  technique 
used  in  this  research  have  been  known  for  some  time,  it  is  believed  that 
the  derivation  by  this  technique  provides  further  insight  into  the  problem 
of  automatic  function  evaluation  because  the  normalization  technique  can  be 
extended  to  algorithms  beyond  division  and  square  root  for  which  hand- 
computation  methods  are  not  known.  Considerable  attention  is  devoted  in 
this  paper  to  division  and  square  root,  not  because  the  algorithms  devised 
are  better  than  those  previously  known  (indeed,  some  coincide),  but  rather 
because  it  is  believed  that  such  a  discussion  leads  to  a  fuller  understanding 
of  the  general  technique.  With  this  thought  in  mind,  let  us  begin  with  a 
normalization  approach  division  algorithm. 

Suppose  one  wishes  to  find  the  quotient  of  two  numbers  represented 
in  normalized  binary  floating-point  format, 

X  =  x  •  2a  x,y  e  [l/2,  l) 

Y  =  y  •  Z  a,p  integers. 

Let  X  be  the  divisor  and  Y  be  the  dividend.  The  exponent  of  the  quotient  Q, 
presents  no  particular  problem,  differing  by  at  most  one  from  the  quantity 

(p-a). 


q  =  I  =  J£s£  =  z  .  20-oO  =   .  2(^) 


X       JX        x 
x«2 

where  q  =  y/x;  q  e(l/2,  2).  One  is  thus  able  to  concentrate  his  efforts  on 

the  fractional  parts  of  the  operands  and  devise  a  convenient  algorithm  for 

computing  q. 

Suppose  one  multiplies  both  the  dividend  and  divisor  by  a  sequence 

of  (non-zero)  constants  {d.}  such  that  the  resulting  divisor  tends  towards 

unity;  one  produces  the  quotient  q. 


M 

y;\  di      M 

x     M        .  ~     1 
1=0 


x  IT  d. 
i=0  : 


M 
if   x  TT  d.  =   1. 
i=0  x 


Clearly  the  choice  of  the  set  of  constants  {d.}  is  critical,  both  for 
"normalizing"  the  divisor  towards  unity  and  for  performing  the  indicated 
multiplications . 


form 


It  is  convenient  to  choose  the  set  of  constants  {d  }  to  be  of  the 

i 


d.  =  1  ±  2"1. 

i 


One  might  choose  the  positive  sign  if  the  partially  normalized  divisor  is 
less  than  unity  and  the  negative  sign  otherwise.  Then  each  "multiplication" 
can  be  performed  by  a  shift  (of  i  places)  and  an  addition  (preceded  by  a 
complementation  if  the  sign  is  negative) .   If  two  complete  adder  circuits 
are  available,  both  the  partially  normalized  divisor  x^  and  the  partially 
developed  quotient  q  are  formed  simultaneously  and  independently. 


k 
*k+l  =  xo  .*0  di  =  *A>      xo  =  X 


\+i  =  %  .%  di  -  VV   *o  ■  y' 


The  time  required  to  perform  a  division  with  this  algorithm  is  approximately 
the  time  required  to  do  M  additions,  assuming  that  comparisons,  shifts,  and 
complements  are  very  much  faster  than  additions. 

The  above  process  is  the  normalization  approach  analog  of  the 
the  non-restoring  recoding  in  the  following  sense.  Let  us  write 

d.  =  1  +  s.2_1,   s.  =  [1,  1}  . 
i        l   '    l     ' 

Then  the  sequence  {s.}  represents  a  non-restoring  recoding  of  the  reciprocal 
of  the  divisor.  Note,  however,  that  the  quotient  is  obtained  in  conventional 
form  (digital  values  0,  l)  rather  than  in  recoded  form  (digital  values  1,  l). 

Since  in  the  above  algorithm,  the  probability  of  choosing  s.  =  0  is 
zero,  the  shift  average  <S>  =  l/(l-p  )  is  unity. 

One  can  increase  the  shift  average  (and  the  speed)  by  allowing 

s.  =  0.  For  example,  one  might  propose  the  following  alternate  selection 

th 
rules:   for  the  k   step, 

1       if  x^  >  1  -  2_k 


dk  = 


1  +  2_k  if  x.  <  1  -  2"k. 


The  division  would  then  be  completed  in  M/2  "addition  cycle  times,"  on  the 
average,  for  a  shift  average  of  two.  An  "addition  cycle  time,"  in  this 
context,  includes  the  time  required  for  comparison,  shift,  and  complement, 
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as  well  as  the  addition  itself.  This  alternate  algorithm  is  the  conventional 
recoding  with  digital  values  0,  1. 

Clearly,  one  can  go  a  step  further  and  introduce  redundancy,  that 
is,  allow  more  than  two  digital  values.   In  particular,  one  may  allow  digital 
values  1,  0,  1.  If  the  algorithm  is  properly  scaled  (i.e.,  the  proper 
selection  rules  are  chosen),  a  shift  average  approaching  three  can  be 
achieved.   It  is  known  that  a  scale  factor  (essentially  the  ratio  of  com- 
parison constant  to  multiplier)  in  the  range  [2/3,  5/6]  yields  a  recoding 
for  which  the  probability  of  a  zero  approaches  its  limit  of  2/3.   A  convenient 
scale  factor  in  the  allowed  range  is  3A» 

Thus  one  may  formulate  the  following  algorithm  for  division:   for 
the  k   step, 


d,  = 


1  +  s,  2_k,    1  <  k  <  M 
k   '      —   - 


s,  = 


1  if  x.  <  1  -  3A 

^  0      otherwise 
1  if  x^  >  1  +  3A 


-k 


(The  initial  multiplier  d  may  be  chosen  in  a  special  way.)   This  division 
algorithm  requires  an  average  of  only  M/3  addition  cycle  times  (if  two  adders 
are  available)  compared  to  M/2  addition  cycle  times  for  a  conventional  (0,  l) 
division  and  M  addition  cycle  times  for  the  common  non-restoring  (l,  l) 
division.   This  division  algorithm  is  discussed  in  detail  in  Section  3. 


2.1.2  Other  normalization  division  algorithms 

While  the  division  algorithm  just  proposed  has  been  chosen  for 
detailed  study  in  this  paper,  three  other  normalization  division  schemes  are 
known  and  are  discussed  here  for  completeness.   To  compare  the  four  algorithms, 
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it  is  convenient  to  write  each  in  a  recursive  form,  as  is  done  for  the  first 
algorithm  here.  Let 

k' 

ak+i  ■ 1  -  «<£  di) 

-  1  -  (1  -  ak)dk 


-k 
But  d  =  1  +  s  2   ,  so 

k        k   ' 

ak+i  ■  -%2'y' +  ak(1+sk2"k) 

ak+i  =  ak  +  sk  v"k  -  sk2"k  (s-x) 

where  GC     approaches  zero.  This  algorithm  is  a  multiplicative  reciprocal 

formation,  i.e., 

M 

TT  d.  S  - 
i=0  x   X 

which  simultaneously  uses  the  digits  of  the  reciprocal  to  form  the  quotient. 

Such  a  formulation  of  this  algorithm  leads  quite  naturally  to  an  analogous 

additive  reciprocal  formation,  represented  by  the  following  recursion. 

k 
a,  .    =   1  -  x(  Z  d! ) 
k+1        1-0  x 

where  d|  =  s.2   .   Then, 

l    l  ' 

<x  ,  =  1  - 


k+l  "   "  \+l 
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k 
ak+1  =  i-x(_Z  s.2"1) 

1=0 

k-1        . 

=  1  -  x(  E  s.2_1  +  s.  2   ) 
i=0  x      k 

k-1     .        , 
=  1  -  x(  E  s.2"1)  -xs2'k 
i=0  x  k 

=  ak  "  Sk  X  2"k  ^2'2^ 

where  <x  approaches  zero,  as  before.  This  algorithm  is  nothing  more  than  a 

bit  by  bit  recoding  of  the  reciprocal  of  the  divisor, 

M 

E  disi. 
1*0  X   X 

It  offers  the  advantage  of  a  simple  recursion,  but  requires  that  the  divisor 

be  set  aside  in  a  special  register  throughout  the  division.  Furthermore,  and 

more  importantly,  as  indicated  by  the  appearance  of  the  divisor  in  the 

recursion  formula,  in  order  to  achieve  a  minimal  recoding,  the  comparison 

constants  in  the  selection  rules  must  be  scaled  with  respect  to  the  divisor 

in  a  manner  similar  to  that  discussed  below  for  the  scaled  square  rooting 

algorithm.   Since  this  algorithm  offers  no  redeeming  features,  it  is  not 

proposed  for  implementation. 

Both  of  the  algorithms  discussed  above  are  reciprocal  formation 
algorithms  (although  the  quotient  may  be  formed  simultaneously).   Slight 
alterations  in  these  algorithms  allow  one  to  perform  the  division  directly. 

The  following  recursion  is  quite  analogous  to  the  algorithm 
represented  by  recursion  relation  (2-1). 


v  ■  y  -  x  .*0  ai 
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\+l  =  y  '  Vl 

=  y  -w 


y  -  (y^k)dk 


=  y  -  y  dk +  <\  \ 


=  y  -  y(l+sk2  *)   +  0=k(l+sk2  *) 

=  % +  \  sk2    "  y  \2  (2-3) 


where  a  approaches  zero.  Here 

K 

M 

H  d.  =  J  =  q. 
i=0  x   x 

This  algorithm  has  the  curious  property  that  the  comparison  constants  must 
be  scaled  with  respect  to  the  dividend,  rather  than  the  divisor,  to  achieve 
a  minimal  recoding.  One  must  again  set  aside  an  extra  register,  this  time 
to  hold  the  dividend.  The  only  redeeming  feature  of  this  algorithm  is  that 
Q^  ,  is  a  remainder  in  the  classical  sense  (the  difference  between  the  dividend 
and  the  product  of  the  divisor  and  the  quotient) .  This  advantage  is  not 
sufficient  to  offset  the  necessity  of  scaling  the  comparison  constants,  and 
this  algorithm  is  not  proposed  for  implementation. 

One  may  also  propose  a  slight  alteration  in  the  algorithm  repre- 
sented by  recursion  relation  (2-2). 

1=0 


=  y  - 


*k+i 


k 
av+l  =  7  -  x(  Z  s  2"1) 
k+i       i=0  x 

k-1 

=  y  -  x(  E  s.2  X  +  s,2~K) 
i=0        k 

k-1        . 

=  y  -  x(  E  s.2_1)  -  x  sn  2 
i=0  x         k 

=  ak  -  sk  x  2~k  (2-1+) 

where  a     approaches  zero,  as  before.   This  algorithm  is  a  bit  by  bit  recodin^ 


of  the  quotient, 


M 

E  d_!  =  ^  =  q. 


i=0  X   X 

This  last  division  algorithm,  derived  above  by  the  normalization  technique, 
may  be  recognized  to  be  the  well-known  SRT  division,  which  must  be  scaled 
with  respect  to  the  divisor  to  achieve  a  minimal  recoding.  Because  this 
division  has  been  analyzed  so  thoroughly  elsewhere,  it  need  not  be  discussed 
further  here . 

Thus  two  normalization  approaches  are  known,  one  multiplicative 
(continued  product)  and  one  additive  (continued  summation).  Each  approach 
leads  to  a  pair  of  possible  division  schemes.   The  basic  requirement  is 
simply  to  force  either  1  -  qx  or  y  -  qx  to  zero,  forming  q  in  any  convenient 
manner.   It  is  not  known  whether  some  combination  of  the  above  approaches,  or 
perhaps  an  entirely  new  approach,  would  lead  to  a  more  efficient  algorithm. 

A  similar  situation  is  observed  in  square  rooting,  as  discussed 
below.  Many  of  the  algorithms  for  other  functions  require  a  combination  of 
multiplicative  and  additive  techniques,  and  only  one  algorithm  is  known  for 
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these  functions.  For  the  most  basic  function,  multiplication,  only  the 
additive  scheme  leads  to  a  feasible  algorithm. 

2.2  The  standard  (redundant)  multiplication  scheme 

Suppose  one  wishes  to  form  the  product  of  two  floating-point  numbers 
X  and  Y.  The  exponent  of  the  product  P  presents  no  problem,  differing  by  at 
most  one.  from  the  sum  of  the  exponents  of  the  given  operands. 

P  =  YX  =  y  •  2P  •  x  .  2°  =  y*  •  2^1   p  •  2^ 

where  p  =  yx,  p  e    [l/^-,  l) .  For  multiplication,  one  proposes  the  following 

additive  scheme,  which  is  nothing  other  than  the  standard  (redundant) 

multiplication  algorithm. 

M      M 
P  =  yx  =  y(x  -  Z  m  +  Z  m  ) 
i=0     i=0 

where  m.  =  s.2   ,   s.  =  fl,  0,  1}  .  The  selection  rules  are  such  that 
1    l   '        l     '   ' 

M 
x  -  Z  m.  =  0 
i=0  X 

so  that 

M 
p  =  y  Z  m  . 
i=0 

Two  simple  recursions  are  performed  simultaneously,  but  independently. 


vi =  x "  A  mi 


k 

Z 

i=0 


k-1 

=  x  -  Z  m.  -  m 
1=0  1         ^ 


-\-\  (2-5) 
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k+l 


k 
y  Z  m 
i=0 

k-1 
y  Z  m  +  y  til 
1=0  X     * 


=  Pk  +  7   V 


(2-6) 


Thus  this  multiplication  algorithm  can  be  performed  in  an  average 
of  M/3  addition  cycle  times.  The  details  of  this  algorithm  are  discussed  in 
Section  k. 

It  can  be  seen  that  the  analogous  multiplicative  (continued  product) 

formulation  is  not  feasible.  Here 

M 
x 

P  =  yx  =  y 


x  n 

i=0 

m! 

i 

M 

n 

i=0 

m! 

i 

where  m!  =  1  +  s.2   ,    s.  =  fl,  0,  1} .   The  selection  rules  would  be  such 

l        l    '  l     '      ' 

that 

M 
1  -  x  H  ml  =  0 
i=0  X 

so  that 


y 


p  = 


M 

II  m'. 
1=0  x 


Clearly  it  is  not  efficient  to  form  p  in  such  a  manner.  Whether  some 
combination  of  the  multiplicative  and  additive  schemes  might  lead  to  an 
efficient  algorithm  is  not  known. 
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2.3  Normalization  schemes  for  square  rooting 

As  in  the  case  of  division,  four  basic  normalization  algorithms  to 
perform  square  rooting  are  known:   either  the  square  root  or  its  reciprocal 
can  be  formed;  each  can  be  formed  by  either  a  multiplicative  or  an  additive 
scheme.  Two  of  the  algorithms  must  be  scaled  indirectly  with  respect  to  the 
root  (directly  with  respect  to  the  operand),  one  simply  with  respect  to  the 
operand,  and  one  need  not  be  scaled  to  achieve  a  minimal  recoding. 

Suppose  one  wishes  to  find  the  square  root  of  X,  X  >  0.   It  is 
convenient,  as  a  preparatory  step,  to  normalize  in  a  manner  such  that  the 
exponent  is  even, 

X  =  x  •  if  Xe    [X'k>    ^ 

en'  even  integer. 

The  exponent  of  the  root  is  thus  a' /2  and  is  formed  by  shifting  a".  One  is 
thus  able  to  concentrate  his  efforts  on  the  fractional  part  of  the  operand 
and  formulate  a  convenient  algorithm  for  computing  r  =   V  x.  Let  us  begin 
by  considering  a  multiplicative  approach  that  is  quite  analogous  to  the  first 
division  scheme  proposed' in  Section  2.1. 

One  multiplies  the  given  operand  x  by  a  sequence  of  (non-zero) 
constants  {r.}  such  that  the  resulting  operand  tends  towards  unity. 


M 
x  n  r. 

1-0  x 

x  = 


M 
IT  r. 
i=0  x 


where 


r 


i\2 


=  (1  +  s.2   ) 


i         l 
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.     0-(i-l)    2  -21 

.  =  1  +  s.2       +  s.  2    , 


s.  =  {1,  0,  1} 


Thus, 


so  that, 


or 


M 
1  -  x  IT  r.  =  0. 
1=0  x 


M       M  •  ?   n 

n  r.  =  H  (1  +  s.2"1)  =  - 

1=0  X   1=0 


M  . 

IT  (1  +  s.2"1)  =  -i 

•  ^       i      "\/x 
1=0  v 


M         .    y 
7     IT  (1  +  s  2  X)  3.= 

1=0         x        v* 


M  .      __ 

x  IT  (1  +  s.2"1)  ="\/x. 

1=0      x 


The  selection  rules  for  the  choice  of  the  set  of  multiplier  constants 
{r.}  are  essentially  the  same  as  the  selection  rules  for  the  division  scheme 
represented  by  recursion  formula  (2-l).  To  carry  out  the  algorithm,  two  re- 
cursions are  performed  simultaneously  and  independently. 


R    =  1  -  x(  n  r  ) 
k+1        i=0  X 


1  - 


Vl 


=  x  "  Vk 


1  -  (1  -  Pk)rk 
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pk+l  =1'\  +   Vk 


-.     o-(k-l)    2  ^"2k 
But  r.  =  1  +  s,  2  v    '   +   s.  2  .  ,   so 
k       k  k     ' 


^i  =  Pv  +  0v-l)(2sv  +  s^  2-k)2"k 


k+1   ^k   XKk 


k    k 


(2-7) 


R,  _  =  x  IT  -Vr~ 
^+1     i=0   x 


x  n  (l  +  s.2"1) 

i=0       X 


=  x 


k-1 

n  (i  +  s.2"1) 

i=0       X 


(1  +  sk2"k) 


=  R.  +  s  R  2 
k    k  k 


-k 


(2-8) 


where  (3  approaches  zero  and  R  approaches  the  required  root.  Note  that  by 
initializing  R  to  an  arbitrary  value  y,  one  can  form  the  more  general 
quantity  y/-\[x~  rather   than  the  more  common  problem  of  finding  yx  (when 
y  =  x).   This  algorithm  is  discussed  in  complete  detail  in  Section  6. 

The  above  algorithm  is  a  multiplicative  formation  of  the  reciprocal 
of  the  square  root,  i.e., 


M  . 

n^7  =  -2=-. 

i=ov  x     -V7 

Such  a  multiplicative  formulation  leads  one  to  investigate  the  feasibility  of 
the  analogous  additive  reciprocal  root  formation,  represented  by  the  following 
recursions. 
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k 


Pk+l  =   X  " 


X(Z  v^T)' 

i=0 


=  1  - 


Vi 


where  r! 

1 


-i  2 
(s.2  )  .  Then 
l 


k-1 


-i     ~-kx2 


1=0 


=  1  -  x 


^f1    -iv2   _  0-k  ^    -i    2  0-2k 
(  L  s.2   )   +  2s  2     L     s.2   +  s  2 

1=0  X         K     i=0  x      K 


k-1 
1  -  x(  Z  s.2"1)2  -  2s,  x  2~\    +   s2 


1=0 


k    k 


pk  -  2skx  V    +  \  2 


(2-9) 


\+1  ■  £  VT 


k 

Z  s.2"1 
1=0  x 


k-1     .       . 
Z  s.2"1  +  Sl2"k 
1=0  x      k 


=  R.  +  s  2 

K.    k 


-k 


(2-10) 


where  (3  approaches  zero  and  R  approaches  the  reciprocal  of  the  square  root 
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of  the  given  operand.  From  recursion  (2-9)  one  can  see  that  this  algorithm 
cannot  be  performed  efficiently  because  the  indicated  multiplication  of  x  by 
R  is  indeed  a  full-scale  multiplication.  It  is  interesting  to  note  also 
that  the  given  operand  must  be  set  aside  in  a  special  register  throughout 
the  algorithm.  Furthermore,  in  order  to  achieve  a  minimal  recoding,  the 
comparison  constants  must  be  scaled  with  respect  to  the  root  of  the  operand, 
(in  practice  one  must  convert  this  scaling  to  a  scaling  with  respect  to  the 
operand  itself,  since,  after  all,  the  root  is  not  known).  This  algorithm 
seems  to  offer  no  redeeming  features  and  is  not  studied  in  detail  or  proposed 
for  implementation. 

Both  of  the  square  root  algorithms  discussed  to  this  point  are 
reciprocal  root  formation  schemes,  although  in  the  first  the  root  itself  can 
easily  be  formed  by  appropriate  initialization.  The  algorithms  discussed 
below  form  the  root  directly. 

An  algorithm  which  is  quite  similar  to  that  represented  by  recursion 
relations  (2-7)  and  (2-8)  but  which  forms  the  root  directly  is  that  represented 
by  the  following  recursion  relations. 


1=0 


k-1 
=  x  -  (  n  r  )r 

i=0  x  k 


=  x  -  (x  -  Pk)rk 


=  x  -  x  rk  +  Pkrk 


1  o-(k-l)    2  -2k 

But  r  =  1  +  s  2     '  +  s.  2   ,   so 

K  K  k 
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pk+i  -  \ +  \(\s~^~1]  +  4  £"2k) 


/     0-(k-l)  2     -2kv 

x(sk2  +  sk  2        ) 


^  Pk  +    (Pk   -  -)(23k  +   s*  2"k)2-k 


(2-11) 


k 

r.   .  =    n   Vr~ 
k+1     i=o       x 


k 

n    (l  +  s.2"1) 
i 

i=0 


k-1 

n    (i  +  s.2"x) 

1=0 


(1  +  sk2-k) 


R.    +   sn  R   2 
k         k  k 


-k 


(2-12) 


where  (3  approaches  zero  and  R  approaches  the  root  of  the  given  operand.  From 
relation  (2-ll)  one  can  see  that  the  comparison  constants  must  be  scaled  with 
respect  to  the  operand.  Since  this  algorithm  is  less  efficient  than  that 
first  discussed,  it  is  not  proposed  for  implementation. 

Finally,  one  may  consider  an  additive  algorithm  which  forms  the  root 
directly. 


'k+1 


k 

x  -  (  E 

i=0 


r!  )2 

l' 


k-1 


x 


-tA-V9!^' 


i=0 


k-1  k-1 

1=0   X      V   i=0  V 
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k-1 


.-k„     2  -2k 


■  pk "  sA2"(k_1)  -  4  2"2k  <2-w>' 


k    _ 

r,  .  =  E  -J7T 

^+1  1=0  V  1 

k 


=  E  s.2 

l-o  x 


-1 


=  R  +  s2"k  (2-14) 

k    k 


It  is  shown  in  Section  7  that  this  algorithm  must  be  scaled  indirectly  with 
respect  to  the  root  of  the  given  operand  (in  practice,  directly  with  respect 
to  the  operand  itself),  but  the  recursions  are  so  simple  and  convenient  that 
it  is  feasible  to  propose  an  implementation.  This  algorithm  is  discussed 
fully  in  Section  7. 

Hence,  as  in  division,  four  normalization  schemes  are  known;  possibly 
others  exist.  Two  algorithms  from  this  set  are  studied  in  detail  and  are 
proposed  for  possible  implementation  because  there  is  no  clear  case  to  be 
made  for  one  in  preference  to  the  other—one  is  preferable  in  the  strictly 
binary  case,  the  other  more  readily  amenable  to  a  higher  radix  implementation. 

2.h     Remarks  on  scaling 


With  regard  to  the  necessity  for  scaling,  the  following  table  sum- 
marizes the  observed  requirements. 


2k 
TABLE  2 

(Requirements  for  Scaling) 


Multiplicative 
(Continued  Product) 

Additive 
(Continued  Sum) 


yx 


I        1 


None   wrt  y 


None 


wrt  x 


None   None   wrt  x   wrt  ~Vx   wrt  ~\/x 


The  necessity  for  scaling  of  the  comparison  constants  in  order  to  achieve  a 
minimal  recoding  is  basically  an  observed  phenomenon  which  is  not  fully 
understood  at  this  time.   It  is  certainly  an  interesting  question  for  further 
study. 

2.5  Concluding  remark 

Most  of  the  remainder  of  this  paper  is  concerned  with  the  derivation 
of  those  algorithms  proposed  for  implementation,  a  discussion  of  convenient 
initialization,  a  brief  study  of  implementation  and  hardware  requirements, 
and  a  discussion  of  error  bounds  for  the  algorithms. 
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3.   THE  ALGORITHM  FOR  DIVISION 

3.1  Basic  algorithm 

In  Section  2.1.1,  a  division  scheme  is  proposed  which  is  discussed 
here  in  some  detail. 

It  is  assumed  that  one  is  attempting  to  formulate  an  algorithm  for 
computing  the  quotient  of  two  numbers  represented  in  the  usual  binary  format, 

X  =  x  •  2a  x,y  e  [l/2,  l) 

Y  =  y  •  2P  a, £  integers 

where  X  is  the  divisor  and  Y  is  the  dividend.  The  quotient,  as  well  as  the 
divisor  and  dividend,  is  assumed  to  lie  within  machine  range.  As  mentioned 
earlier,  the  exponent  of  the  quotient  presents  no  problem  and  one  may  con- 
centrate on  the  ratio  of  fractional  parts, 

x 

Let  us  multiply  both  divisor  and  dividend  by  a  sequence  of  (non-zero)  constants 
{d.}  chosen  in  such  a  way  that  the  resulting  divisor  tends  towards  unity. 

M 

x     M       J  .  n     l 

x  n  d.   1=0 
i=o  x 

M 
if  x  H  d.  =  1. 
i=0  x 

With  the  hardware  configuration  of  Figure  1  (Section  3.5),  both  the  partially 
normalized  divisor  x^  and  the  partially  developed  quotient  q  may  be  formed 
simultaneously  and  independently. 

25 


26 


*k+l  =  xo  A  di  =  VV   xo  =  x> 


qk+i  =  qo  .nn  di  =  qA'   qo  =  y> 

1=0 


-k 


where  d.  =  1  +  s.  2   ,   1  <  k  <  M, 


= 


1   if   x.  <  1  -  c 

k 

0  otherwise 

1  if   x.  >  1  +  c 


,-k 


The  initial  multiplier  d  is  chosen  in  a  special  way.   [See  Section  3.2.] 

It  is  known  that  a  minimal  recoding  (p  =  2/3)  will  result  if  the 
comparison  constant  lies  in  the  range  [2/3,  5/6],  the  most  convenient  choice 
for  c  in  this  range  being  3/^.  However,  it  is  also  convenient  to  bound  the 
variable 


upon  which  the  actual  comparisons  are  made,  to  lie  in  the  range  (-1,  +l),  so 
that  each  u,  can  be  represented  in  the  machine  with  only  a  sign  bit  to  the 
left  of  the  radix  point.   This  is  easily  accomplished  by  scaling  down  both 
the  comparison  constants  and  the  multipliers  by  a  factor  of  two. 
Let 


\- 


s,  = 


1  +  1/2  s  2"k  =  1  +  s,  2"(k+1),   1  <  k  <  M, 


1   if   x,  <  1  -  3/8  •  2"k 


0 


otherwise 


1   if   x^  >  1  +  3/8  •  2 


-k 


or  equivalently, 
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-{ 


1 

if 

%  <  -3/8 

0 

otherwise 

1 

if 

\  >  +3/8 

where 


\ 


=  2 


k 


(^-D 


as  previously  indicated.   It  is  shown  that   | tjl  j  <  1  for  k  =  0,  1, .  . ..,  M, 

within  the  desired  range  of  (-1,  +l) . 


3.2  Choice  of  initialization  step 

The  initial  operand  xQ  =  x  lies  in  the  range  [l/2,  l) .  The  object 

k 

of  the  normalization  process  is  to  force  x,    =  x„  H  d.   to  unity.  The 

k+1     0  i=Q   1 

selection  rules  for  the  set  of  multipliers  {d.}  are  essentially  symmetric 

about  unity,  the  only  lack  of  symmetry  lying  in  the  choice  of  the  placement 

of  equality  signs,  so  it  would  be  convenient  to  choose  the  initial  multiplier 

d  so  as  to  force  x  =  x  d   to  lie  in  a  range  symmetric  about  unity.  Thus 

the  following  choice  of  initial  multiplier  would  be  desirable. 

/ 

2  (shift)   if   1/2  <  x  <  2/3 

dn  =  ( 

if   2/3  <  x_  <  1. 


This  selection  would  leave  x,  in  the  range  (2/3,  V3)>  symmetric  about  unity, 
and  would  make  the  choice  of  s  =  1  and  s,  =  1  later  in  the  algorithm  equally 
likely.   However,  the  comparison  constant  of  2/3  specified  in  this  rule  re- 
quires a  full  scale  comparison  in  order  to  choose  d  .  A  less  convenient,  but 
much  more  practical  choice  of  initialization  step  is  the  following. 


do  = 


28 

2  (shift)   if 

1  if 


1/2  <  xQ  <  3A 
3A  <  xQ  <  1. 


Then, 

xx  e    [3A,  3/2) 

and  the  probabilities,  for  a  finite  register  length  (where  the  skewness 
in  the  probability  density  of  x^  has  not  yet  dissipated),  that  s  =  1  and 
s  =  1  are  not  equal.  However,  as  verified  by  experimental  means,  the 
probability  that  s  =  0  is  still  very  nearly  2/3,  and  that  is  the  critical 
factor.  Refer  to  Section  3.^-. 

Having  chosen  an  initialization  step,  the  algorithm  is  now  completely 
specified.  An  example  illustrates  the  flow  of  the  algorithm. 
Example :  Numerical  values  of  x  and  q  are  listed  below,  rounded  to  lU 


decimal  places.  The  divisor  x  =  0.6   =  0.10011001  (where  the  overbar 


indicates  a  repeated  digit  pattern)  and  the  divisor  y  =  0.5in  =  0.10000000. 
The  correct  quotient,  also  rounded  to  1^  decimal  places,  is  q=  0.83333333333333. 
For  this  divisor,  d  =  2,  so  that  x  =  1.2  and  q  =  1.0. 
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TABLE  3 


Vl 


lk+l 


11  " 

1  ■  ' 

1 

0 

1.20000000000000 

1.00000000000000 

2 

-1 

0.90000000000000 

O.75OOOOOOOOOOOO 

3 

1 

1.01250000000000 

O.8U375OOOOOOOOO 

'  k 

0 

1.01250000000000 

O.8U375OOOOOOOOO 

5 

0 

1.01250000000000 

O.8U375OOOOOOOOO 

6 

-1 

0.99667968750000 

0.83056640625000 

T 

0 

0.99667968750000 

0.83056640625000 

8 

1 

1.00057296752930 

0.83381080627441 

9 

0 

1.00057296752930 

O.8338IO80627U+I 

10 

0 

1.00057296752930 

O.83381080627441 

11 

-1 

1.00008^0651000 

O.833U0367209166 

12 

0 

1.00008^0651000 

O.833U0367209166 

13 

0 

1.00008^0651000 

0.833^0367209166 

Ik 

-1 

1.00002336620198 

O.83335280516832 

15 

-1 

0.99999284791078 

O.8333273T325898 

16 

0 

0.9999928U791078 

0.83332737325898 

17 

1 

1.000000U772507U 

O.83333373IOU228 

18 

0 

1. 000000  V772507U 

0.8333337310^228 

19 

0 

1.000000^7725074 

0.8333337310^228 

20 

0 

1.000000U772507U 

0.8333337310^228 

21 

-1 

1.0000000000U136 

0.83333333367780 

22 

0 

1.00000000004136 

0.83333333367780 

23 

0 

1.0000000000U136 

0.83333333367780 

2k 

0 

1.0000000000U136 

0.83333333367780 

25 

0 

1.00000000004136 

(Continued) 

0.83333333367780 
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TABLE  3  (Continued) 


k+1 


26 

0 

1.00000000004136 

27 

0 

1.00000000004136 

28 

0 

1.00000000004136 

29 

0 

1.00000000004136 

30 

0 

1.00000000004136 

31 

-1 

0.99999999994769 

32 

0 

0.99999999994769 

33 

0 

0.99999999994769 

34 

1 

1.00000000000590 

35 

0 

1.00000000000590 

36 

0 

1.00000000000590 

37 

-1 

0.99999999999863 

38 

0 

0.99999999999863 

39 

1 

1.00000000000044 

Ho 

0 

1.00000000000044 

*k+l 

0.83333333367780 
0.83333333367780 
O.8333333336778O 
0.83333333367780 
0.83333333367780 
0.83333333328975 
0.83333333328975 
0.83333333328975 
0.83333333333825 
0.83333333333825 
0.83333333333825 
0.83333333333219 
0.83333333333219 
0.83333333333370 
0.83333333333370 


Thus  the  quotient  produced  in  40  steps  of  this  algorithm  is 


Vl  =  °-83333333333370 

-12 
which  differs  from  the  correct  quotient  by  0.37  x  10   .  The  error  bound, 


-12 


derived  in  the  next  section,  is  0.45  x  10    for  M  =  40  steps. 
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3.3  Error  bound 

Given  an  initial  operand  x_=  x  in  the  range  [l/2,  l)  and  the 
selection  rules  listed  in  the  last  section  for  the  choice  of  the  set  of 
multipliers,  one  may  now  produce  a  bound  on  x^  ,  the  resulting  divisor,  and 

ultimately  an  error  bound  for  the  division  algorithm  itself.  Since, 

/ 


d 


0 


-  < 


then 


and 


2   if   1/2  <  xQ  <  3/U 
1   if   3A  <  xQ  <  1 


Xl  =  xodo  €  [3/u>  3/2) 


ux  =  2(Xl  -  1)  e  [-1/2,  1). 

Next  let  us  find  the  range  of  x  =  x  d  .  The  selection  rule  for  the  first 
(k  =  l)  step  is, 

d,  =  1  +  lA  s 

1   if   3/U  <  x1  <  13/16 

0  if   13/l6  <x1<   19/l6 

1  if   19/l6  <  x  <  3/2. 

The  first  range,  [3/k,    13/16)  maps  onto  5/U  [3/U,  13/l6)  or  [15/16,  65/6U); 

the  second  range,  [13/16,  I9/16)  maps  onto  itself;  the  third  range,  [19/16,3/2), 

maps  onto  3/U  [19/16,  3/2)  or  [5T/6U,  9/8).  Hence, 

x2  e  [13/16,  19/16), 

so  that  x  lies  in  the  middle  range  of  the  selection  rules  for  s  .   It  is 
proved  by  induction  that  similar  behavior  occurs  later  in  the  process. 
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Hypothesis:  For  k  >  2,  x^  e  [1  -  3/8  '  2~^k"1\      1  +  3/8  •  2~^~1'). 
The  hypothesis  has  been  shown  explicitly  to  be  valid  for  k  =  2. 

Proof:  For  some  k  >  2,   x^  e  [1  -  3/8  •  2"'k-1',   1  +  3/8  •  2_^k"1'))  or, 
equivalently,  x.  e  [1  -  3A  *  2~k,   1  +  3A  •  2~  )  •  For  all  k  >  2, 


dn  =  1  +  1/2  sn  2 
k  k 


-k 


1   if   x.  <  1  -  3/8  •  2 


= 


otherwise 


1   if 


^  >  1  +  3/8  •  2 


-k 


The  first  range, 


[l  -  3 A  •  2~k,  1  -  3/8  •  2"k) 


maps  onto 


or 


(1  +  1/2  •  2"k)  •  [1  -  3A  '  2~\  1  -  3/8  -  2~k), 


[1  -  lA  •  2"k  -  3/8  •  2"2k,   1  +  1/8  •  2"k  -  3/16  '  2"2k), 


which  lies  within  the  desired  range  of 


[1  -  3/8  •  2_k,  1  +  3/8  •  2"k) 


The  second  range, 


[1  -  3/8  •  2~\   1  +  3/8  •  2"k) 


maps  onto  itself,  and  thus  lies  within  the  desired  range, 
The  third  range, 


[1  +  3/8  •  2"\  1  +  3A  '  2"k) 
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maps  onto 

(1  -  1/2  •  2"k)    [1+3/8  •  2~k,   1  +  3/k    •  2'k) 
or 

[1  -  1/8  •  2~k  -  3/16  .  2"2k,  1  +  1 A  •  2"k  -  3/8  •  2~2k), 

which  again  lies  within  the  desired  range.  Hence, 

xk+1  e   [1  -  3/8  •  2~\  1  +  3/8  •  2"k)  ' 

for  all  k  >  2.  Furthermore,  |uj  <  3A  for  a11  k  >  2  and  k  |  <  1  for  all  k. 

Q.E.D. 

Therefore,  the  final  divisor 

Vl  e  [1  -  3/8  •  2"M,  1  +  3/8  -  2"M) 
so  that  the  error  in  the  final  divisor  is  bounded  by 

|xM+1  -  1|  <  3/8  •  2-M. 

The  algorithm  is  thus  capable  of  producing  M  correct  quotient  bits  in  M  steps 

-(M+l) 
beyond  the  initialization,  the  error  in  the  algorithm  being  less  than  2     -, 

neglecting  round-off. 

3.^  Experimental  estimate  of  speed 

The  theoretical  prediction  that,  with  the  selection  rules  chosen, 
the  probability  that  s  be  zero  is  2/3  is  an  asymptotic  result,  that  is,  it 
assumes  a  register  of  infinite  length.  To  test  the  value  of  the  algorithm 
in  any  practical  machine,  it  becomes  necessary  to  estimate  the  probability 
of  a  zero  for  a  relatively  short  register.  For  this  reason,  this  division 
algorithm  was  simulated  on  the  available  IBM  3^0/75  computer  system  for  a 
register  length  M  of  kO   bits,  and  a  Monte  Carlo  estimate  of  the  probability 


3h 


.18 


of  a  zero  was  made.  For  a  statistically  significant  sample  of  2   pseudo- 
random divisor  operands,  approximately  uniform  over  the  range  [l/2,  l),  the 
mean  probability  of  a  zero  is  O.676,  with  a  corresponding  shift  average  of 
3.08.  As  a  practical  matter,  these  figures  differ  very  little  from  their 
theoretically  predicted  asymptotic  values  of  2/3  and  3.00.  The  relevant 
statistical  considerations  are  discussed  in  Appendix  A. 


3.5  Imp leme  n t  at  i  on 

The  necessary  recursion  formulas  to  implement  the  division  process 
are  those  given  earlier,  namely, 


x.  _  =  xn  IT  d.  aii,   x  =  x, 
k+1    0  .  _  1    kk'   0 
i=0 


\+i  =  qo  .nn  di  =  W>    qo  =  y> 

1=0 
except  that  the  former  recursion  must  be  rewritten  in  terms  of  u  =  2  (x.  -  l) 
so  that  the  comparison  constants  remain  fixed  at  ±  3/8  (except  for  the  initial- 
ization) . 


Vi  =  2\  +  \  +  w 


(3-D 


o"(k+l) 
q.     ,    =   q,    +  2    x         /s1  q 

k+1         k  kk 


(3-2) 


where 


Sk  = 


if 


1       if 


^  <  -3/8 

otherwise 
\  >  +3/8. 
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To  implement  the  first  of  these  recursion  relations,  one  requires 
a  counter  (0  to  M)  to  keep  track  of  the  step  being  performed,  a  comparison 
circuit  to  select  the  value  of  s   (represented  by  a  sign  bit  and  a  magnitude 
bit),  a  register  to  hold  x,  (eventually  sc   ),  a  complementing  circuit  to  form 
the  negative  of  a  given  operand,  a  shifting  network  capable  of  rapidly  shift- 
ing k  places  (0  <  k  <  M),  a  full  adder  (which  probably  includes  the  register 
mentioned  above),  and  facilities  for  shifting  a  single  bit  position.  To 
implement  the  second  recursion,  one  requires  an  additional  register  to  hold 
q  (probably  falling  within  the  full  adder  again),  another  complementing 
circuit  and  shifting  network,  and  another  full  adder  as  indicated.  Of  course, 
control  circuitry  is  also  required,  but  that  may  be  provided  in  either  hard- 
ware or  microprogramming  form.  The  control  requirement  is  a  design  criterion 
which  is  highly  dependent  on  other  machine  design  parameters  and  cannot  be 
discussed  in  this  paper  in  any  detail. 

A  hardware  structure  sufficient  to  implement  all  algorithms  dis- 
cussed in  this  paper  has  been  developed  in  block  diagram  form.  The  flow  of 
information  for  the  division  algorithm  is  indicated  in  Figure  1.  Those  boxes, 
which  are  not  required  by  the  division  algorithm  are  shown  in  dashed  form; 
the  counter  and  comparison  circuitry  are  not  explicitly  shown. 

The  read-only  memory  indicated  in  Figure  1  is  required  by  several 
functions  in  the  set;  its  size  is  a  function  of  the  register  length,  as  well 
as  which  functions  in  the  set  are  to  be  implemented  in  the  machine.  To 
implement  all  of  the  functions  discussed,  a  total  of  slightly  more  than  Um/3 
words  need  be  stored  in  the  memory. 

The  "mini -adder"  is  an  adder  capable  of  adding  only  a  single  bit 
(in  any  digital  position)  to  a  full  precision  operand.  The  "one-bit  shifter" 
is  capable  of  shifting  left  one  digital  position. 
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3.6  Concluding  remark 

A  division  algorithm  using  an  implicit  redundant  recoding  and 
yielding  a  shift  average  of  approximately  three  has  been  proposed.   It  has 
been  shown  that  the  algorithm  can  be  realized  with  reasonable  cost  in  hard- 
ware, especially  with  the  expected  advances  in  hardware  technology  (LSI). 
It  is  shown  later  in  this  paper  that  the  structure  of  Figure  1  is  sufficient 
to  implement  the  other  elementary  functions  in  the  set  as  well. 

While  considerable  discussion  has  been  devoted  to  division  tech- 
niques, it  should  be  kept  clearly  in  mind  that  the  avowed  purpose  of  this 
research  was  not  to  develop  new  division  schemes,  although  that  has  been  done, 
but  rather  to  extend  the  techniques  used  in  those  schemes  to  the  elementary 
transcendental  functions. 


k.     THE  ALGORITHM  FOR  MULTIPLICATION 

4.1  Basic  algorithm 

In  Section  2.2,  it  is  shown  that  an  additive  normalization  scheme 
leads  to  a  feasible  algorithm  for  multiplication,  but  that  a  multiplicative 
normalization  scheme  does  not.  The  details  of  the  additive  scheme  are 
discussed  here. 

In  formulating  an  algorithm  for  multiplication,  one  need  only  be 
concerned  -with  the  product  of  the  fractional  parts  y  and  x  of  the  multiplicand 
Y  and  the  multiplier  X. 


M 


M 


p  =  yx 


=  y(x  -  E  m.  +  E  m. ) 


1=0  x  1=0  x 


where 


,-k 


n^  =  1/2  sk2"^,   sk  =  {1,  0,  1} 


The  selection  rules  must  be  chosen  in  such  a  way  that 


so  that 


M 
x  -  E  m.  =  0 
i=0  x 


M 
=  y  E  m.  . 


P  =  y 


i=0 


The  selection  rule  at  the  k   step  for  the  selection  of  s,  is  virtually  the 
same  as  that  for  division,  except  that  the  operand  is  being  forced  to  zero 
rather  than  unity. 

1   if   x,  <  -  3/8  •  2~k 


sk=/   0 


otherwise 


1   if   x^  >  +3/8  •  2 


-k 


38 


where 


39 


*k+l  =  xo  "  Zn  mi 

1=0 


k-1 

-E 


=  X-  -  A  m.  -  TIL 
0  i=0  x    ^ 


=  xk  ■  V 


x0  =  x 


Pk+1  -  *  £  »i 


k-1 

y  £  mi  +  y^k 


1=0 

=  pk  +  y\, 


v0=   o. 


Letting  u  =  2  x,  ,  one  may  write  the  two  recursions  which  must  be  implemented 
to  perform  multiplication. 


=  2 


Vi  =  ^k  -  sk 


pk+l  =  pk  +  ysk2 


-(k+1) 


(h-1) 


where  the  selection  rule  for  s,  now  reads, 

k 


sk=/  0 


1  if  ^  <  -3/8 
otherwise 
1   if  \>+   3/8. 


Note  that  a  register  must  be  provided  to  hold  the  original  multiplicand 
throughout  the  process. 


i+0 

k  .2     Choice  of  initialization  step 

The  initial  operand  xn  =  x  lies  in  the  range  [l/2,  l) .  The  object 

k 

of  the  normalization  process  is  to  force  x,  n  =  x_   -  Z  m.  to  zero.  The 

K+l    0   i=Q  1 

selection  rules  for  the  set  of  constants  {m.}  are  essentially  symmetric  about 
zero,  and  it  is  convenient  to  choose  the  initial  multiplier  m  so  as  to  force 
x  =  x  -  m  to  lie  in  a  range  symmetric  about  zero,  the  extent  of  the  range 
being  as  small  as  possible.  Further,  m  should  contain  no  more  than  a  single 
non-zero  bit  since  one  would  like  to  form 


Pl  =  P0  +  ^0 


in  no  more  than  one  addition  cycle  time.  The  choice  of 


mQ  =  J 


1/2   if   1/2  <  xQ  <  3/U 
1    if   3/U  <  xn  <  1 


satisfies  all  of  these  criteria  since  it  contains  only  a  single  non-zero  bit 
and  leaves  x..  in  the  range  [-l/h,    +l/U) .  It  is  shown  in  the  next  section 
that  such  a  choice  of  initialization  leads  to  a  convergent  algorithm. 

An  example  illustrates  the  flow  of  the  algorithm. 
Example :  With  a  multiplicand  of  0.5  and  a  multiplier  of  0.6,  the  correct 
product  is  0.3. 

For  this  multiplier,  m  =  l/2,  so  that  x  =  0.1  and  p  =  0.25. 


in 


TABLE  4 


Vl 


\+i 


~^~ 

1 

1 

0 

0.10000000000000 

0.25000000000000 

2 

1 

-0.02500000000000 

0.31250000000000 

3 

0 

-0.02500000000000 

0.31250000000000 

1+ 

-1 

0.00625000000000 

0.29687500000000 

5 

0 

0.00625000000000 

O.296875OOOOOOOO 

6 

1 

-0.00156250000000 

0.30078125000000 

7 

0 

-0.00156250000000 

0.30078125000000 

8 

-1 

0.00039062500000 

0.29980468750000 

9 

0 

0.00039062500000 

0.29980468750000 

10 

1 

-0.00009765625000 

0.30004882812500 

11 

0 

-0.00009765625000 

0.30004882812500 

12 

-1 

0.00002441406250 

0.29998779296875 

13 

0 

0.00002441406250 

O.29998779296875 

11+ 

1 

-0.00000610351563 

0.30000305175781 

15 

0 

-0.00000610351563 

0.30000305175781 

16 

-1 

0.00000152587891 

0.29999923706055 

17 

0 

0.00000152587891 

0.29999923706055 

18 

1 

-0.00000038146973 

0.30000019073486 

19 

0 

-0.00000038146973 

0.30000019073486 

20 

-1 

0.00000009536743 

0.29999995231628 

21 

0 

0.OOOOOOO95367 43 

0.29999995231628 

22 

1 

-0.00000002384188 

0.30000001192093 

23 

0 

-0.00000002384188 

0.30000001192093 

24 

-1 

0.00000000596046 

0.29999999701977 

25 

0 

0.00000000596046 

(Continued) 

0.29999999701977 

1+2 


TABLE  h   (Continued) 


\ 


26 

1 

27 

0 

28 

-i 

29 

0 

30 

i 

31 

0 

32 

-l 

33 

0 

3^ 

1 

35 

0 

36 

-1 

37 

0 

38 

1 

39 

0 

1+0 

-1 

Vi 

-0. 0000000011+9012 

-0.0000000011+9012 

0.00000000037253 

0.00000000037253 

-0.00000000009313 

-0.00000000009313 

0.00000000002328 

0.00000000002328 

-0.00000000000582 

-0.00000000000582 

0.0000000000011+6 
0.000000000001^6 

-0.00000000000036 
-0.00000000000036 
0.00000000000009 


fk+1 

0.30000000071+506 
0.30000000071+506 
0.2999999998137^ 
0.2999999998137^ 
0.30000000001+657 
0.30000000001+657 
0.29999999998836 
0.29999999998836 
0.30000000000291 
0.30000000000291 
0.29999999999927 
0.29999999999927 
0.30000000000018 
0.30000000000018 
0.29999999999995 


Thus  the  product  generated  by  the  algorithm  in  1+0  steps  is 


PM+1  =  0.29999999999995 


-13 


which  differs  from  the  correct  product  by  0,5  x  10   .  The  error  bound, 

-12 

derived  in  the  next  section,  is  0.3*+  x  10    for  M  =  1+0. 

It  may  be  observed  that  the  sequence  of  values  of  s  takes  on  a 


peculiar  pattern  for  this  multiplier:   01010101.   To  dispel  any  inference 
that  such  a  pattern  exists  for  all  multipliers,  let  us  change  the  multiplier 
slightly  and  repeat  the  example. 
Example :   With  y  =  0.5  and  x  =  0.6l,  p  =  O.305.   Also,  m  =  l/2,  x..  =   0.11, 


Px  =  0.25. 


h3 


TABLE  5 


Vl 


k+1 


■mm 

1 

1 

0 

0.11000000000000 

0.25000000000000 

2 

1 

-0.01500000000000 

0.31250000000000 

3 

0 

-0.01500000000000 

0.31250000000000 

1+ 

0 

-0.01500000000000 

0.31250000000000 

5 

-1 

0.00062500000000 

0.30468750000000 

6 

0 

0.00062500000000 

0.30468750000000 

7 

0 

0.00062500000000 

0.30468750000000 

8 

0 

0.00062500000000 

0.30468750000000 

9 

0 

0.00062500000000 

0.30468750000000 

10 

1 

0.00013671875000 

0.30493164062500 

11 

0 

0.00013671875000 

0.30493164062500 

12 

1 

0.0000146481+3750 

0.30499267578125 

13 

0 

0.  OOOOII+6I+8I+375  0 

0.30499267578125 

11+ 

0 

0.00001464843750 

0.30499267578125 

15 

1 

-0.00000061035156 

0.30500030517578 

16 

0 

-0.00000061035156 

0.30500030517578 

IT 

0 

-0.00000061035156 

0.30500030517578 

18 

0 

-0.00000061035156 

0.30500030517578 

19 

0 

-0.00000061035156 

0.30500030517578 

20 

-1 

-0.00000013351440 

O.305OOOO667572O 

21 

0 

-0.00000013351440 

O.305OOOO667572O 

22 

-1 

-0.00000001430511 

0.30500000715256 

23 

0 

-0.00000001430511 

0.30500000715256 

24 

0 

-0.00000001430511 

0.30500000715256 

25 

-1 

0.00000000059605 

(Continued) 

0.30499999970198 

kk 


TABLE  5  (Continued) 


*k+l 


26 

0 

0.00000000059605 

27 

0 

0.00000000059605 

28 

0 

0.00000000059605 

29 

0 

0.00000000059605 

30 

1 

0.00000000013039 

31 

1 

0.00000000013039 

32 

1 

0.00000000001397 

33 

0 

0.00000000001397 

34 

0 

0.00000000001397 

35 

1 

-0.00000000000058 

36 

0 

-0.00000000000058 

37 

0 

-0.00000000000058 

38 

0 

-0.00000000000058 

39 

0 

-0.00000000000058 

4o 

-1 

-0.00000000000013 

0.30499999970198 
0.30499999970198 
0.30499999970198 
0.30499999970198 
0.30499999993481 
0.30499999993481 
0.31+099999999302 
0.3^099999999302 
0.34099999999302 
0.30500000000029 
0.30500000000029 
0.30500000000029 
0.30500000000029 
0.30500000000029 
0.30500000000006 


4.3  Error  bound 

Given  an  initial  operand  x  =  x  in  the  range  [l/2,  l)  and  the 
selection  rules  listed  in  the  first  section  for  the  choice  of  the  set  of 
constants,  one  may  produce  a  bound  on  x^  ,  and  ultimately  an  error  bound  for 
the  multiplication  algorithm. 

It  has  already  been  noted  that,  with  the  previously  indicated  choice 
of  m 


Xx  €  [-1/4,  +1/4). 


Also, 


u^  =  2x  e  [-1/2,  +1/2) 


^5 

Following  the  example  of  the  division  algorithm,  one  may  construct  an  induc- 
tive error  bound  proof. 

Hypothesis:  For  k  >  1,  x^  e  [-3/8  •  2"^k_1%  +  3/8  *  2~(k~1'). 

The  hypothesis  has  been  shown  explicitly  to  be  valid  for  k  =  1. 

Proof:  For  some  k  >  1,  x^   e  [-3/8  '  2"^k_1%   +3/8  •  2"^"^)   or 
xk  e  [-3A  '  2~k,   +3/U  •  2"k).   For  all  k  >  1, 


\   -  ^  Sk2 


-k 


1   If   x.<  -3/8  •  2 


s,  =   <  0       otherwise 
k     ] 

1   if   xk  >  +3/8  •  2_k. 


The  first  range, 


maps  onto 


or 


[-3 A  •  2"k,  -3/8  •  2"k) 


[-3A  '  2_k,   -3/8  •  2"k)  +  1/2  •  2"k 


[-1A  •  2"k,   +1/8  .  2"k), 


which  lies  within  the  desired  range  of 


[-3/8  •  2_k,  +3/8  •  2_k) 


The  second  range, 


[-3/8  •  2"\  +3/8  •  2"k) 


maps  onto  itself,  and  thus  lies  within  the  desired  range. 


h6 

The  third  range, 

[+3/8  •  2~\   +3A  '  2"k) 
maps  onto 

[+3/8  •  2~k,  +3 A  *  2"k)  -  1/2  •  2"k 
or 

[-1/8  •  2~k,  +l/k   •  2"k) 
again  within  the  desired  range.  Hence, 

x^  e  [-3/8  •  2~k,  +3/8  .  2"k) 


for  a 


11  k  >  1.  Furthermore,  l^j  <  3/k   for  all  k  >  1  and  |uj  <  1  f or  all  k, 


since  u  e  [l/2,  l) 


Q.E.D. 

Therefore, 


? 


Vl'  ^  3/8  •  2"M. 


The  error  in  the  multiplication  algorithm, 

yXM+l 

is  thus  less,  in  magnitude,  than  3/8  •  2   and  the  algorithm  is  capable  of 
producing  M  correct  product  bits  in  M  steps  beyond  the  initialization. 

h.k     Experimental  estimate  of  speed 

The  Monte  Carlo  estimate  of  the  mean  probability  of  a  zero  is  0.68^4, 
with  a  corresponding  shift  average  of  3. IT. 


U7 


J+.5  Implement  at  i  on 

The  recursion  formulas  necessary  to  implement  the  multiplication 
algorithm  are  those  given  in  (^--l)  and  (U-2),  repeated  here  for  reference 
purposes. 

\+l   ■  Pk  +  ^k2^  ^-2) 

Except  that  a.  special  register  is  required  to  hold  the  multiplicand  y  through- 
out the  process,  the  hardware  required  to  perform  multiplication  is  virtually 
the  same  as  that  required  for  division. 

A  block  diagram  indicating  the  flow  of  information  required  in  the 
implementation  of  (^-l)  and  (h-2)   is  shown  in  Figure  2. 

k.6     Concluding  remark 

A  multiplication  algorithm  using  an  implicit  redundant  recoding  and 
yielding  a  shift  average  of  approximately  three  has  been  proposed.   It  has 
been  shown  that  this  algorithm  is  compatible  with  the  division  algorithm  in 
that  both  require  virtually  the  same  hardware  configuration. 
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5.  THE  ALGORITHM  FOR  NATURAL  LOGARITHM 

5.1  Basic  algorithm 

The  algorithm  to  compute  the  natural  logarithm  of  an  operand  X  >  0 
is  very  similar  to  the  division  algorithm  discussed  in  Section  3)    in  fact, 
the  normalization  process  is  identical.  Rather  than  forming  the  quotient  in 
the  second  adder  circuit,  one  merely  forms  the  sum  of  a  set  of  precomputed 
constants  drawn  from  a  register- speed  read  only  memory  [ROM],  the  ROM  being 
the  only  piece  of  hardware,  in  addition  to  that  required  by  division,  required 
to  implement  this  algorithm. 

Given  an  operand  X  =  x  •  2  ,  x  e  [l/2,  l),  one  may  write, 

in   X  =  in   x  +  a  in   2. 

The  second  term  in  this  expression  may  be  evaluated  in  one  multiplication 

cycle  time. 

Let 

N  N 

a   =  2   Z  m  ,  n^  =  sk2"  ,  sfc  =  {0,  1} 
i=0 

2-N    2+N 
where  N  determines  the  dynamic  range  of  the  machine,  (2   ,2   ).  Typically 

N  <  10;  if  N  =  10  the  dynamic  range  of  the  machine  is  (2     ,2     )  or 

roughly  (10    ,   10     ),  sufficient  for  most  problems.  Since  a   contains 

relatively  few  bits  (that  is,  normally  N  is  considerably  smaller  than  M),  it 

is  not  necessary  to  recode  a   in  order  to  speed  up  this  multiplication;  there 

is  no  advantage  to  completing  this  multiplication  before  the  computation  of 

in   x  is  accomplished.  A  standard  (n on -redundant)  multiplication  process 

suffices. 

a,  in   2  =  2  [(in  2)  Z  m.  ]. 

i=0  X 

h9 


50 
Letting 


k 

pk+1  =  (in  2)  E  m. 
1=0 

the  necessary  recursion  may  be  written, 

*W  ■  pk  +  sk(to  2)2~k>  50  ■  ° 

which  is  identical  to  .recursion  relation  {h-2)   with  y  =  ,0n  2.  Thus,  one 
adder  circuit  configuration  suffices  to  evaluate  a.   tin   2,  providing  only  that 
the  value  of  £n   2  is  stored  in  the  read  only  memory.   Simultaneously  and  in- 
dependently, the  logarithm  of  the  fractional  part  may  be  evaluated  using  two 
additional  adder  circuit  configurations;  three  such  configurations  are  re- 
quired in  all.  A  final  addition  is  required,  so  that  it  takes  one  addition 
cycle  time  longer  to  compute  the  natural  logarithm  than  to  perform  a  division. 

The  computation  of  a  In  2  requires  no  further  comment,  so  one  may 
concentrate  on  an  algorithm  to  compute  Zn   x,  x  e  [l/2,  l).  As  in  the  division 
algorithm  of  Section  3>  one  multiplies  the  operand  x  by  a  sequence  of  constants 
[£.},    conceptually  dividing  x  by  the  same  continued  product. 

M 

x  n  z. 
i=o  x 

x  = 


M 

IT  £. 
1=0  x 


where  Z,     -   d,  =  1  +  l/2  s  2   ,   1  <  k  <  M,  Z~   =  d._,  the  same  constants  as 
k    k  k   '    —       ~  0    0 

those  chosen  in  the  aforementioned  division  scheme.  Then, 

M  M 

Zn   x  =  Zn(x   IT  £ . )  -  in(  II  &.) 
i  i 

i=0         1=0 

M        M 
=  Zn(x   II  Z.)   +     Z  {-Sin   £.). 
1=0      i=0     x 


51 
A  power  series  expansion  for  Zn   6  is 


Zn   6  =  (6  -  1)  -  1/2  (8  -  l)2  +  1/3  (6  -l)3  -  l/k   {&  -l)k   +  . 


[0  <  6  <  2] 


It  is  known  from  Section  3.3  that 


,   M  -M 

x^   -  1|  =  |x  IE  i     -  1|  <  3/8  •  2  M. 

+x         i=0  x 


Then  to  machine  accuracy  (M  bits), 


M 
Zn   (x  IT  I.  )   =  0 

i=0  1 


and 


M 

Zn   x  =  E  (-£n  £. ). 
1 
i=0 

Providing  that  the  values  { -Zn  Z .}    are  precomputed  and  stored  in 
the  read  only  memory,  one  may  form  Zn   x  by  performing  the  normalization 
process  of  the  division  algorithm  of  Section  3  to  choose  the  set  {Z.)    with 
one  adder  configuration  while  forming  the  summation  of  the  appropriate 
stored  constants  with  a  second  adder  configuration. 

Note  that  when  s  =  0,  Z,    =1  and  Zn  Z,    =  0,  so  that  no  addition 
need  be  performed  in  the  either  adder. 


5.2  Choice  of  initialization  step 

Since  the  normalization  process  is  identical  to  that  of  the  pro- 
posed division,  it  suffices  to  choose  Zn  =  d  .  The  value  of  Zn  2  required 
by  this  choice  was  already  to  be  stored  in  the  memory. 

An  example  is  provided  to  illustrate  the  similarity  of  this 
algorithm  to  that  for  division. 
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Example :   Given  x  =  0.6,  £n   x  =  -O.5IO82562376599.  As  indicated  below,  the 

algorithm  produces  an  approximation  to  In   x  given  by  -O.5IO825623766I+U  which 

-12 
is  in  error  by  0.1+5  x  10    matching  the  error  bound  derived  in  the  first 

section. 

In  the  table  below,  L  represents  the  approximation  to  £,n   x  and 

xv  is  the  operand  being  normalized  to  unity. 


TABLE  6 


Vl 


1 

0 

1.20000000000000 

2 

-1 

0.90000000000000 

3 

1 

1.01250000000000 

h 

0 

1.01250000000000 

5 

0 

1.01250000000000 

6 

-1 

0.99667968750000 

7 

0 

0.99667968750000 

8 

1 

1.00057296752930 

9 

0 

1.00057296752930 

10 

0 

1.00057296752930 

11 

-1 

1.000081+ 1+0651000 

12 

0 

1.000081+ 1+0651000 

13 

0 

1. 00008 1+1+0651000 

111 

-1 

1.00002336620198 

15 

-1 

0.99999281+ 791078 

k+1 

-O.6931V718055995 
-O.U05I+65IO810816 
-O.5232U81U376I155 
-O.5232I+81I+376I+55 
-O.5232U81U376U55 
-O.507U99786796I+I 
-O.507I+99786796I+I 
-0.51139842721207 
-0.511398^2721207 
-0.511398U2721207 
-O.51091002671396 
-O.51091002671396 
-O.51091002671396 
-O.5108I+898969I+99 
-O.510818U7165119 


(Continued) 
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TABLE  6  (Continued) 


k 


^__ 

Ik 

"k+1 

k+1 

16 

0 

0.99999281+79 1078 

-O.5IO818I+7165H9 

17 

1 

1.0000001+772507^ 

-0.510826l010l662 

18 

0 

1.0000001+7725071+ 

-0.51082610101662 

19 

0 

1.0000001+7725071+ 

-0.510826l010l662 

20 

0 

1.0000001+7725071+ 

-0.51082610101662 

21 

-1 

1.0000000001+1336 

-O.5IO82562I+I7935 

22 

0 

1.0000000001+1336 

-0.510825621+17935 

23 

0 

1.0000000001+1336 

-0.510825621+17935 

2fc 

0 

1.0000000001+1336 

-0.510825621+17935 

25 

0 

1.0000000001+1336 

-0.510825621+17935 

26 

0 

1.0000000001+1336 

-0.510825621+ 17935 

27 

0 

1.0000000001+1336 

-0.510825621+17935 

28 

0 

1.000000000U1336 

-O.5IO82562I+ 17935 

29 

0 

1.0000000001+1336 

-O.5IO82562I+I7935 

30 

0 

1.0000000001+1336 

-O.5IO82562I+I7935 

31 

-1 

0.9999999999^769 

-0.51082562371368 

32 

0 

0.9999999999^769 

-O.5IO82562371368 

33 

0 

0.9999999999^769 

-O.5IO82562371368 

3^ 

1 

1.00000000000590 

-0.51082562377189 

35 

0 

1.00000000000590 

-0.51082562377189 

36 

0 

1.00000000000590 

-0.51082562377189 

37 

-1 

0.99999999999863 

-O.5IO82562376U61 

38 

0 

0.99999999999863 

-O.5IO82562376I+61 

39 

1 

1.0000000000001+1+ 

-O.5IO825623766I+I+ 

1+0 

0 

1.0000000000001+1+ 

-O.5IO825623766I+I+ 

5h 

5.3  Error  bound 

It  has  been  shown  that  the  algorithm  provides  M  correct  bits  in 
the  value  of  #n  x,   neglecting  machine  round-off. 

5 .h     Experimental  estimate  of  speed 

The  value  of  #n  X  can  be  computed  in  one  addition  cycle  time  beyond 
that  required  for  division,  or  approximately  1  +  M/3  addition  cycle  times,  on 
the  average. 

5 .5  Implementation 

The  implementation  of  the  evaluation  of  a  |n  2  is  discussed  in 
Section  5.1.   The  recursion  relation  for  the  normalization  process  is  identi- 
cal to  (3"l)  for  division. 

Vi =  2\  +  sk  +  sk\2"k  (5-1} 

The  second  recursion  relation  is  simply  the  continued  summation  of  a  set  of 
stored  constants. 

Lk+1  "  .^  ("in  'l> 

k-1 

=  Z  (-£n  H.)   +   (-in  z    ) 

=Lk+  [-in  (1  +  sk2"(k+l))]  (5-2) 

A  block  diagram  indicating  the  flow  of  information  is  shown  in  Figure  3. 

That  only  a  portion  of  the  set  of  precomputed  constants 
{-in  (l  +  s  2  ^     )}  need  be  explicitly  stored  in  the  ROM  is  easily  seen. 
Consider  again  the  power  series  expansion 
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£n  (1  +  &)  =  5  -  1/2  82  +  1/3  63  -  l/U  8  +  ... 

[-1  <  8  <  +1] 
where  6=  s^^K      If  k  >  [&*],* 

£n  (1  +  6)  =  8  =  sk2_(k+l) 

to  machine  accuracy  and  the  constants  need  not  actually  be  stored. 

5.6  Concluding  remark 

An  algorithm  for  computing  the  natural  logarithm  has  been  proposed; 
the  algorithm  is  virtually  identical  to  the  proposed  division  algorithm, 
except  that  a  small  ROM  is  required.  The  constants  that  must  be  precomputed 
and  stored  are 

£n  2 

-m  (i±2-(k+1))         k-i,  a,  ...,  [*p]. 

A  listing  of  approximate  decimal  equivalents  of  the  required  precomputed 
constants  is  given  in  Appendix  B. 

An  algorithm  for  logarithm  to  any  positive  integer  base  could 
easily  be  formulated  by  the  same  procedure;  new  sets  of  precomputed  constants 
would  be  required. 


'Ml2l 


*  The  largest  integer  not  greater  than  (— p  ) . 


6.  FIRST  ALGORITHM  FOR  SQUARE  ROOT 

6.1  Basic  algorithm 

In  Section  2.3  it  was  shown  that  four  normalization  square  root 
algorithms  are  known;  two  of  these  are  studied  in  this  paper.  The  algorithm 
studied  in  this  section  is  a  multiplicative  (continued  product)  formation  of 
the  quantity  y/~\pT.     The  process  is  effectively  the  same  as  the  proposed 
division  algorithm,  the  only  difference  being  that  the  multiplier  constants 
{r.}  are  the  squares  of  the  multiplier  constants  chosen  for  division.  Let 


M 

x  IT  r . 

1=0  x 
X  ~     M 

IT  r. 

1=0  x 


where 


X  €  [1/2,  1) 


r.  =  (1  +  1/2  s,  2  *)%   1  <  k  <  M,    b.  =  {1,  0,  1} 


k   '    >  -       -  '  k 


and  {r.}  is  chosen  such  that 


M 
x  IT  r.  =  1. 
1=0  x 


Then, 

M 


TI  r.  S   - 
i=0  x      x 

and 

M 
y  n  (1+  X/2   s  2"1)  S-t-  . 
i=0  ^ x 

The  recursion  relation  for  the  normalization  of  the  operand  x  is  complicated 
by  the  fact  that  the  multipliers  have  three  terms  rather  than  two. 
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k 

x.  ,  =  x.  IT  r.  x^  =  x 

K+l    0  .  _   1  0 

i=0 

=  (x   n  r  )(1  +  1/2  s  2  Kr 

Ui=0  X  k 


R.  ,  =  R^  TI  Vr.         R^  =  y 
k+1    0  .  _  v  l  0   J 

i=0 


k-1 
=  (R   H  -V7T)(1  +  1/2  s  2"K) 

U  i=0    x  K 


=\   (1+  sk2"(k+1)).  (6-2) 

The  multiplier  constants  for  the  division  algorithm  are 

d,=l+  1/2  s.  2"k 

k  k 

whereas  those  chosen  in  this  square  root  algorithm  are 

,      -k    2  -2(k+l) 

r.  =  1  +  sn  2   +  s.  2   v 
k        k       k 

The  dominant  terms  (other  than  unity)  in  the  multipliers  differ  by  a  factor 
of  two;  for  this  reason,  the  comparison  constants  must  also  differ  by  a 
factor  of  two  to  achieve  the  same  recoding.  A  simple  change  in  notation  or 
a  comparable  change  in  the  implementation  transformation  can  overcome  this 
minor  discrepancy;  the  latter  is  chosen  as  being  conceptually  neater;  in 
practice,  the  two  are  the  same.   Thus,  in  terms  of  the  partially  normalized 
operand  x,  ,  the  selection  rules  are  chosen  as  follows. 

rk  =  (1  +  1/2  sk2"k)2 


sk  = 


/ 

1 

1  ° 
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if      xk  <  1  -  3/k 


otherwise 
if   x^  >  1  +  3/U 


.-k 


The  implementation  transformation  in  the  division  algorithm  is  given  by 


\  ■ 2  K  -  « 

whereas  the  implementation  transformation  in  this  algorithm  is  taken  as 

so  that  \vl\   <   1,  as  is  shown  in  Section  6.3.  The  selection  rules,  in  terms 
of  vl,   may  thus  be  written  in  exactly  the  same  form  as  that  for  division: 

'   1   if   u^  <  -3/8 


^ 


otherwise 


1   if   u^  >  +3/8, 


6.2  Choice  of  initialization  step 

The  initialization  step  comparable  to  that  chosen  for  the  division 

2 
algorithm  is  r  =  d  ,  that  is, 


0    0' 


ro  = 


U   if   l/U  <  xQ  <  3/U 

1   if   3/U  <  xQ  <  1 


which  leaves  x  =  x  r  in  the  range  [3/U,  3)  and  u  =  x-,    -   1  €  [-l/U,  2), 
outside  the  desired  range,  (-1,  +l).  Merely  changing  the  initial  comparison 
constant  provides  a  more  acceptable  range. 


r  = 
0 


U   if   l/U  <  xQ  <  1/2 
1   if   1/2  <  xQ  <  1. 


6o 

Thus,  x  e  [l/2,  2),      u  e  [-1/2,  l).   It  is  shown  in  the  next  section  that 
this  choice  of  initialization  leads  to  a  convergent  algorithm. 


Example :  An  example  computing  0.5/  V0.6  =  O.6I+5I+9722I+3679O  is  tabulated 

below.   It  may  be  seen  that  the  algorithm  produces  an  approximation  which  is 

-12  -IP 

in  error  by  0.11  x  10   ,  well  within  the  error  bound  of  0.68  x  10  "  "  derived 

in  the  next  section. 


TABLE  7 


1: 


1 

1 

2 

0 

3 

0 

1+ 

1 

5 

0 

6 

0 

7 

0 

8 

1 

9 

0 

10 

-1 

11 

0 

12 

0 

13 

0 

Ik 

1 

15 

0 

0.93750000000000 
0.93750000000000 
0.93750000000000 
0. 9970092773^375 
0.9970092773^375 
0.9970092773^375 
0.9970092773^375 
1.00090761+81219^ 
1.00090761+812191+ 
0.999930U3788180 
0.999930^3788180 
0.999930U3788180 
0.999930U3788180 
0.999991^6972357 
0.999991^6972357 


Vi 

0.62500000000000 
0.62500000000000 
0.62500000000000 
0.61+1+53125000000 
0.6I+1+5  3125  000000 
0.61+ 1+53125000000 
0.61+ 1+5  3125  000000 
0.61+579010009766 
0.61+579010009766 
0.6^51+71+7729003'+ 
0.61+5I+71+7729003U 
0.6^51+71+77290031+ 
0.6^51+71+77290031+ 
0.61+51+91+1+7122715 
0.6I+5I+9UI+7122715 


(Continued) 
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TABLE  7  (Continued) 


k 


_^_ 

^k 

\+l 

k+1 

16 

0 

0.999991^6972357 

0.61+51+91+ 1+7122715 

IT 

1 

0.99999909906757 

0.61+51+9693359315 

18 

0 

0.99999909906757 

0.61+51+9693359315 

19 

0 

0.99999909906757 

0.61+51+9693359315 

20 

1 

1.00000005271+126 

0.61+51+9721+139007 

21 

0 

1.0000000527^126 

0.61+51+9721H39007 

22 

0 

1. 00000005  271+126 

0.61+51+9721+139007 

23 

0 

1.00000005271+126 

0.61+51+9721+139007 

2U 

-1 

0.99999999313661 

0.61+51+9722215275 

25 

0 

0.99999999313661 

0.61+51+9722215275 

26 

0 

0.99999999313661 

0.6I+5 1+9722215  275 

27 

1 

1.00000000058719 

0. 61+51+97221+557^2 

28 

0 

1.00000000058719 

0.61+51+97221+5571+2 

29 

0 

1.00000000058719 

0.61+51+97221+557^2 

30 

0 

1.00000000058719 

0.61+51+97221+5571+2 

31 

-1 

1.00000000012153 

0.61+51+97221+1+0713 

32 

0 

1.00000000012153 

0.6U51+97221+1+0713 

33 

-1 

1.00000000000511 

0.61+51+97221+36955 

3^ 

0 

1.00000000000511 

O.6I+5 1+9722 1+3695  5 

35 

0 

1.00000000000511 

0.61+51+97221+36955 

36 

0 

1.00000000000511 

0.61+51+97221+36955 

37 

0 

1.00000000000511 

0.61+51+97221+36955 

38 

-1 

1.0000000000011+8 

0.61+51+9722U36838 

39 

-1 

0.99999999999657 

0.61+51+97221+36779 

1+0 

0 

0.99999999999657 

0.61+51+97221+36779 
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6.3  Error  bound 

It  has  already  been  shown  that 


xQ  e  [lA,  1),   uQ  e   [-3/8,  0) 
x±   e  [1/2,  2),        u^  e  [-1/2,  l). 


Next  let  us  find  the  range  of  x  .  The  selection  rule  for  the  first  (k  =  l) 


step  is  given  by, 


8   -   /  0 


r  =  1  +  1/2  s  +  l/l6  s^ 

1   if   1/2  <  x±  <   5/8 
if   5/8  <  x  <  11/8 


I  1   if   11/8  <  x  <  2. 


The  first  range, 

[1/2,  5/8) 
maps  onto 

(1  +  1/2  +  1/16)  •  [1/2,  5/8) 
or 

[25/32,  125/128). 
The  second  range, 

[5/8,  11/8) 
maps  onto  itself. 
The  third  range, 

[11/8,  2) 
maps  onto 

(1  -  1/2  +  1/16)  •  [11/8,  2) 


or 


[99/128,  9/8). 
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Hence, 

x2  e  [5/8,  11/8),   u2  6  [-3A,  +3A) 

so  that  x  lies  in  the  middle  range  of  the  selection  rules  for  s  and 
|u  J  <  1.   It  is  easy  to  show  that 


XM+1  "  Xl  ^  3/U  *  2"  >    l\l  <  1  for  a11  k' 


Hypothesis:  For  k  >  2,   x,  €  [1  -  3A  '  2~^~1\      1  +  3/1+  •  2~^k~10. 

The  hypothesis  has  been  shown  to  be  true  for  k  =  2.   The  induction  proof  for 

the  behavior  for  k  >  2  is  virtually  identical  to  that  in  Section  3.3  for 

division,  except  for  the  perturbation  caused  by  the  second  order  term  in  the 

multiplier. 

Proof:  For  some  k  >  2,   x,  e  [1  -  3/2  1   2~k,   1  +  3/2  •  2~k) .  For  all  k  >  2, 

n      -k   .,  /,   2  -2k 
r.  =  1  +  s  2   +  1/k   s  2 


1 


k      '   k 
if   x.  <  1  -  3A  •  2_k 


sn  =  <  0       otherwise 
k 


The  first  range, 


maps  onto 


1   if   XL  >  1  +  3A  '  2"  . 


[1  -  3/2  •  2"k,   1  -  3/k    •  2"k) 


[1  -  2"k(l/2  +  5 A  *  2~k  +  3/8  •  2"2k), 

1  +  2"k(l/U  -  1/2  •  2_k  -  3/16  •  2~2k)). 


But  for  k  >  2, 


1/2  +  5A  '  2'k  +  3/8  •  2"2k  <  339/512  <  3A 


61+ 
so  here 

xk+1  €  [1  -  3A  *  2~k,     1  +  3A  •  2"  ) 

The  middle  range, 

[1  -  3 A  •  2"\    1  +  3 A  •  2"k) 

maps  onto  itself. 
The  last  range, 

[1  +  3A  '  2"k,   1  +  3/2  •  2"k) 


maps  onto 


[1  -  2~k(lA  +  1/2  '  2"k  -  3/16  •  2"2k), 

1  +  2_k(l/2  -  5A  '  2"k  +  3/8  '  2"2k)) 


which  clearly  lies  within  the  desired  range.  Thus, 

-M 


and 


Let 


so  that 


and 


Then 


*M+1  "  Xl  *  3/U  •  2 


|uj  <  1     for  all  k. 


M     

x  =  n  "Vr- 

i=0 


,2 


x  X2  -  1|  <  3A  •  2"M. 


:(X  +  -pr)U  --±r  I  <  3A  '  2"M. 
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Let  e  be  defined  by 

where  e  represents  a  relative  error  in  the  approximation  to  1/  -\Tx;    |ej  «  1, 
Then, 

x(^)|^|<3A-2-M 

or 

(2+e)|e|  <3A  '  2~M. 

Since  |e|  «  1,  one  may  write  approximately 

Id  <3/8  -2-M. 

Thus, 


X  -ft   <  ^(3/8   •  2"M) 


or 


|x  -  4=|  <  3A  •  2"M. 

Finally,  since  y  <  1, 


that  is,  in  order  to  guarantee  M  correct  bits  in  the  value  of  y/^J~x~,    one 
must  perform  M  4-  1  steps  beyond  the  initialization. 

6.k     Experimental  estimate  of  speed 

The  Monte  Carlo  estimate  of  the  mean  probability  of  a  zero  is 
O.669,  with  a  corresponding  shift  average  of  3.0*+. 
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6.5  Implementation 

Rewriting  recursions  (6-1)  and  (6-2)  under  the  transformation 
xl    =   2   (x,  -l),  one  obtains, 

Vi  ■  <2\  +  V +  2"k<ask\ +  V*  $ +  2"2k<1/2  S'V> 

uQ  =  l/2(x-l)  (6-3) 

Rk+i  "  Ek  +  1/2  sk\2"k>        Ro  "  y-  <6-u' 

Figures  h,    5,  and  6  show  possible  hardware  configurations  to  implement  these 
recursion  relations. 

6.6  Concluding  remark 

Further  comments  about  this  algorithm  are  included  in  Section  7.6 
where  a  comparison  of  the  two  square  root  algorithms  is  made. 
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7.   SECOND  ALGORITHM  FOR  SQUARE  ROOT 

7.1  Basic  algorithm 

Let  us  consider  in  this  section  an  additive  square  root  algorithm 
which  can  be  performed  in  approximately  the  same  amount  of  time  as  the 
division  algorithm  and  with  essentially  the  same  hardware,  but  which  requires 
a  scaling  of  the  comparison  constants  to  achieve  a  minimal  recoding.  It  is 
somewhat  less  general  than  the  algorithm  studied  in  the  last  section  in  that 
the  root  itself,  rather  than  y/  ~\fx,    is  evaluated. 

The  normalization  process  one  wishes  to  carry  out  is 


7k+l  =:  7k  "  rk 


where 


V^ 


r.  =  1/2  s  2~k 

k       k 


j'  1   if   7l,  <  -c  *  2"1 


k 


sn  =  (  0       otherwise 
k 


so  that 


1   if   7  >  +c  •  2_k 


c  e  [1/3,  5/12) 


7M+1  =  ° 


M         _ 

Vi =  A  ri  =  v^- 


i=0 


One  would  achieve  an  asymptotic  shift  average  of  three  with  this  recoding, 
While  this  is  a  conceptually  neat  algorithm,  it  cannot  be  carried  out  in 
practice  in  this  form  since  ~\l~x   is  unknown  and  the  recursion  cannot  be 
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initialized.  Rather,  one  must  perform  recursions  with  respect  to  a  known 
quantity  such  as 


\+l 


=  x. 


k 

E  r 
i=0  " 


X0  =  x 


where 


rk  =  1/2  sk2 


-k 


Sk  = 


if   xn  <  -c' 


otherwise 


if 


*k^ 


>   +c' 


,-k 


,-k 


One  must  thus  find,  a  suitable  relationship  between  c'  and  c  that  yields  a 
minimal  recoding.  Now, 


7 


k+1 


\+l 


■yfx-  R 


k+l 


=     x  -  R 


k+l 


so 


\+i  =  >W(  V^+  Rk+i) 

The  problem  thus  becomes  one  of  choosing  a  convenient  value  for 

c'  =  2"\/*x"c,   c  g  [1/3,  5/12).  As  an  example,  suppose  yx  =  l/2  (x  =  l/M; 

then  c'  =  c  =  3/8  is  the  most  convenient  choice;  if  yx  =  1  (x  =  l),  then 


c'  =  2c  =  3A  is  convenient.  Metze  solved  a  similar  scaling  problem  for 

the  SRT  division;  he  showed  that  at  least  four  regions  are  needed  to  cover 

3  ^ 

the  range,  whose  endpoints  are  in  the  ratio  2/l,  since  (5A)  <  2,  (5A)  >  2. 
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To  reduce  the  precision  of  the  comparisons,  five  regions  are  recommended, 
Convenient  choices  for  comparison  constants  for  the  various  ranges  of  x 
are  listed  in  Table  8. 

TABLE  8 


X 

& 

Comparison  Constant  c1 

LV  16; 

[0.500,  0.558) 

3 

8 

Ll6»  8j 

[0.558,  0.612) 

4 

16 

L8'  l6; 

[0.612,  0.750) 

1 
2 

Ll6>  8j 

[0.750,  0.935) 

5 

8 

[J,D 

[0.935,  1) 

3 
1+ 

7.2  Choice  of  initialization  step 

Let  us  consider,  for  a  moment,  the  recursions  necessary  to  imple- 
ment this  algorithm. 


\+l 


r 


k 

Z     r. 
0         i=0     x 


xQ  =  x 


X0   " 


xo  " 


k-1 

Z     r.    +  r. 
1=0    x        ki 


k-1 
Z     rv 
i=0     " 


"   sk2"\ 


2   -2(k+l) 
V 


-k  -  w*  -  #"2(k+1) 


(7-1) 
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k 
R,  .  =  Z  r,  Rn  =  0 

k+1     .  n        1  0 

1=0 

=R+s?-(k+1>  (7-2) 

k    k 

Rewriting  (7~l)  under  the  transformation  u,  =  2     x  ,  chosen  to  force 

\\\  <  i, 

V  =  2\-^sA+sk2"M'.     uo  =  1^V     (7-3) 

If  it  can  be  guaranteed  that  R  has  a  zero  in  bit  position  (k  +  2)  then  the 

2   -fk+2) 
single  bit  represented  by  s  2       can  simply  be  inserted  in  s  R  and  the 

value  (s,R,  +  s  2      )  can  be  shifted,  complemented,  and  added  to  2vl. 

If  the  initialization  constant  r  is  a  low  precision  number,  and  if  radix 

complement  notation  is  used  for  negative  numbers,  s  R  will  indeed  have  a 

zero  in  bit  position  (k  +  2)  as  desired.  This  minimizes  the  delay  time 

caused  by  the  mini-adder. 

Since  the  comparison  constant  for  the  algorithm  is  a  function  of 

the  operand,  the  initialization  constant  r  is  also  allowed  to  be  a  function 

of  the  operand. 


TABLE  9 


r 
o 


U'  16;        2 


_9 

16 

5 

8 

3 

k 


tf- 

I) 

8; 

* 

it> 

Ll6' 

J) 

* 

1) 

7^ 


It  is  shown  in  the  next  section  that  these  choices  for  initialization  lead 
to  a  convergent  algorithm. 

An  example  is  listed  below. 
Example :   If  x  =  0.6,  then  ~\fx~  =   O.77U5966692U1I+8.   In  kO   steps,  the 

algorithm  produces  an  approximation  to  v  x  which  is  in  error  by  0.6  x  10   , 

-IP 
well  within  the  worst  case  error  bound  of  O.U5  x  10   (derived  in  the  next 

section) . 


TABLE  10 


1 

0 

2 

0 

3 

0 

k 

0 

5 

1 

6 

1 

7 

0 

8 

0 

9 

1 

10 

0 

11 

0 

12 

1 

13 

1 

Ik 

0 

15 

0 

16 

0 

17 

0 

18 

0 

19 

0 

20 

-1 

0.03750000000000 

0.03750000000000 

0.03750000000000 

0.03750000000000 

0.01381835937500 

0.00179^3359375 

0.00179^3359375 

0.00179^3359375 

0.00028285980225 

0.00028285980225 

0.00028285980225 

0.00009377896786 

-0.00000077262521 

-0.00000077262521 

-0.00000077262521 

-0.00000077262521 

-0.00000077262521 

-0.00000077262521 

-0.00000077262521 

-0.00000003391201 

(Continued) 


k+1 

0.75000000000000 
0.75000000000000 
0.75000000000000 
0.75000000000000 
O.765625OOOOOOOO 
0.773^3750000000 
0.773^3750000000 
0.773^3750000000 
O.77I+I+1U0625OOOO 
0.77^lU06250000 
O.77I1U1U0625OOOO 

0. 77^53613281250 

0.77^59716796875 
0.77^59716796875 
0.77^59716796875 
0.77^59716796875 
0.77^59716796875 
0.77^59716796875 
0.77^59716796875 
0.77^59669113159 
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TABLE  10  (Continued) 




JL 

~k+l 

21 

0 

-0.00000003391201 

22 

0 

-0.00000003391201 

23 

0 

-0.00000003391201 

2k 

0 

-0.00000003391201 

25 

-1 

-0.00000001082723 

26 

-1 

0.00000000071516 

27 

0 

0.00000000071516 

28 

0 

0.00000000071516 

29 

0 

0.00000000071516 

30 

1 

-0.00000000000624 

31 

0 

-0.00000000000624 

32 

0 

-0.00000000000624 

33 

0 

-0.00000000000624 

3k 

0 

-0.00000000000624 

35 

0 

-0.00000000000624 

36 

0 

-0.0000000000062)4 

37 

-1 

-0.00000000000060 

38 

0 

-0.00000000000060 

39 

0 

-0.00000000000060 

Uo 

-1 

0.00000000000010 

k+l 
0.77^59669113159 
0.77^59669113159 
0.77^59669113159 
0.77^59669113159 
0.77^596676230^3 
0.77^59666877985 
0.77^59666877985 
0.77^59666877985 
0.77^59666877985 
0.77^5966692^551 
0.77U5966692^551 
0.77^59666924551 
0.77^59666924551 
0.77459666924551 
0.7745966692U551 

0. 77^59666921+551 

0.77^5966692i+l87 
0.7745966692U187 
0.77459666924187 
O.7745966692I+IU2 


7.3  Error  bound 

In  this  section  it  is  shown  that  the  error  in  the  approximation 
to  the  root  is  bounded  by  2~^  +  ',  so  that  M  correct  bits  in  the  root  are 
produced  in  M  steps  beyond  the  initialization. 

The  first  step  of  the  proof  consists  of  producing,  by  direct  com- 
putation, bounds  on  the  first  few  approximations  to~yx.   The  proof  is  then 
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completed  by  induction.     Recall, 


\  =   Vx""  \,  70  =    ~\[Te   [1/2,    1) 


Table  9  is  then  completed  to  yield  Table  11. 


X 

/x 

LU'  i6j 

[0.500, 

0.558) 

Ll6'  8; 

[0.558, 

0.612) 

L8'  16; 

[0.612, 

0.750) 

Ll6'  8; 

[0.750, 

0.935) 

[J.  1) 

[0.935, 

1) 

TABLE 

11 

r 
o 

yn    =  vx  -  r 
1                     o 

2 
xn    =  x  -  r 

1                   0 

1 
2 

[0,    +0.058) 

[°-i*> 

9 
16 

[-0.00U5,    +0.0U95) 

L   256'  256; 

5 
8 

[-0.013,    +0.125) 

r    x    ^ 

3 
1* 

[0,  +0.185) 

c°-i|j 

[-0.065,  0) 


[■£.  0) 


Hence, 


7;L  e  [-O.065,   +O.I85) 


By  looking  at  the  selection  rules  for  the  first  (k  =  l)  step,  one  can  see 
that  s,  =0,   r  =  0,  independent  of  x  (  a  virtue  of  the  initialization 
chosen) .  Then 


72  e  [-O.O65,    +O.I85). 


Hypothesis 


7kl  <2 


-k 


for  all  k.   The  hypothesis  has  been  shown  to  be 


true  for  k  =  0,  1,  2.  Assume  \y    j  <  2   for  some  k  and  show 
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th 


Proof:  For  the  k   step, 


rR  =  1/2  %2 


-k 


sk  = 


1   if   x.  <  -c'  •  2 

0  otherwise 

1  if   x^  >  +c*  •  2 


-k 


-k 


where  c'  =  2^/x c,    c  e  [  1/3,  5/12). 

-k  — 
Range  1:   Suppose  xn  <  -c'  •  2  so  that  s,  =  1,   rn  = 
" k                     k    '   k 


,-k 


1/2  •  2   .   Then, 


7k+l   7k  "  rk 


=  7k  +  1/2  '  2"\ 


Clearly  the  signs  of  x  and  7  must  agree,  so 

k      k 


-2~k  <  7n  <  0 
—  k 


and  thus, 


k+11 


<  2"(k+l) 


-k  -k 

Range  2:   Suppose  -c'  •  2   <  x^  <  c'  •  2    so  that   s  =0,   r  =  0.  Now, 


*k 


k+i  "  2yr-  7 


2V^"d-^) 


c'  .  2 


-k 


7       < 

7k+l'  - 


k 


2Vr<l-a^ 
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2-yr  .  c  •  2_k 
i  < 


2V^d-^) 


7      < 


5/12  •  2_k 


(1  -  2  *) 

For  k  >  3, 

1 

<  8/7 


1  -  2"k  " 


so 


|7k+1|  <  5/12  '  8/7  •  2"k<2"(k+l). 

-k  /     -k 

Range  3:   Suppose  x  >  c'  •  2    so  that   s  =  1,   r  =  1/2  •  2   .   Then, 

7k+l  =  7k  "  rk 

=  7.  -  1/2  '  2"\ 


Since  the  signs  of  x^  and  7  agree, 

0  <  7,  <  +2~k 
k  — 

and  thus 


Hence,  for  all  k, 


7k|  -  l\  -^1  <  s"k. 


Q.E.D. 


(k-2) 

Next  it  is  shown  that  |u,|  <  1  for  all  k.  Recall  u.  =  2V    'x,. 

Since  x  €  [l/k,    1),   u  e  [l/l6,  l/k) .   Also, 
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M  -  K  -  *l 

<  2"k  (-  ~  +  2"k  +  -^) 

<  2"k  (2-yr+  2"k) 

|\|  <  iA  (2Y7+   2_k) 

<  l/k   (2  +  1/2)   for  k  >  1. 


Hence 


,   I -u-  J  <  1  for  all  k. 


7.^  Experimental  estimate  of  speed 

The  Monte  Carlo  estimate  of  the  mean  probability  of  a  zero  is  0.66l, 
with  a  corresponding  shift  average  of  2.96. 

7.5  Implementation 

The  recursion  formulas  for  implementation  are  (7-2)  and  (7_3). 
Figure  7  shows  the  corresponding  block  diagram. 

7.6  Concluding  remark 

Two  square  root  algorithms  have  been  studied  in  detail,  a  multipli- 
cative scheme  in  Section  6  and  an  additive  scheme  in  this  section.  The 
multiplicative  scheme  clearly  requires  more  hardware  to  achieve  a  speed  com- 
parable to  that  of  the  additive  scheme;  it  cannot  achieve  equality  of  speed. 
It  thus  appears  that  the  multiplicative  algorithm  should  be  discarded  in 
favor  of  the  additive  algorithm,  and  this  is  probably  true  in  a  strictly 
binary  implementation.   However,  the  algorithm  of  Section  6  offers  two  re- 
deeming features  :   first,  the  comparison  constants  need  not  be  scaled; 
second,  the  algorithm  easily  lends  itself  to  a  higher  radix  implementation, 
which  the  algorithm  of  Section  7  does  not. 
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8.   THE  ALGORITHM  FOR  EXPONENTIAL 

8.1  Basic  algorithm 

The  exponential  e  may  be  evaluated  in  three  multiplication  cycle 
times,  two  of -which  are  required  to  identify  a  convenient  operand  which  may 
be  normalized  to  zero.  Let 


X    X  log2e  loge2 
e  =  e 

(1+f)  In  2 

=  e 


where  X  log  e  =  I  +  f 

I  =  integer 
-1  <  f  <  +1. 


Then, 


X    I  gn  2  f  £n  2 
e  =  e       e 

=  2  e 


where 


x  =  f  £n   2 

-in  2  <  x  <  +£n   2. 

Thus  three  basic  steps  are  required: 

(1)  Multiply  X  by  the  precomputed  and  stored  constant  log^e 
to  identify  I,  f . 

(2)  Multiply  f  by  the  precomputed  and  stored  constant  log  2 
to  generate  x. 

(3)  Evaluate  e  via  the  normalization  scheme  discussed  below. 
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83 

Then, 


'j 


e  =  e  2  ,   e  e  [1/2,  2) 


The  first  two  of  these  operations  may  be  performed  by  the  multiplication 
algorithm  of  Section  k,    and  require  no  further  discussion.   It  is  the  major 
purpose  of  this  section  to  formulate  an  algorithm  to  evaluate  e  ,  |x|  <  £n  2, 
The  algorithm  proposed  here  is,  in  some  sense,  the  dual  of  the  algorithm 
proposed  in  Section  5  for  gn  x.  Let 

M  M 

x  =  x  -  0n(  II  e. )  +  gn(  H  e.) 
i  l 

i=0         1=0 

where 

e,  =  1  +  1/2  sn  2~k,   1  <  k  <  M. 


Then, 


M  M 

x  =  (x  -  Z  gne.)  +  gn(n  e.) 
li 
i=0  i=0 


M  M 

(x  -  Z  jne.)  0n(  II  e. ) 
i  i 

x        1=0  1=0 

e  =  e  e 


M 
(x-Z^ne.)     M 

1=0       •  (  H  e.). 
1=0 


The  set  of  multiplier  constants  is  chosen  such  that 


M 

x+  Z  (-  0  n  e . )  =  0 

i 
i=0 


so  that, 


81+ 

M 

X       ~       TT 

e   =  H  e. , 
i=0  x 


The  required  set  of  precomputed  constants,  {-£n  e.}  is  exactly  the  same  set 
required  by  the  algorithm  for  ?n  x. 

Thus,  to  evaluate  e  ,  two  simple  recursions  are  performed. 


xk+1  =  xQ  +  Z  (-in  e.)      xQ  =  x 

i=0 


=  xk  +  (-£n   ek)  (8-1) 


E.  1  =  n  e.  E.  =  1 

k+1   .  _  l  0 

i=0 


=  Ek  +  skEk2"(k+l)  (3-2) 


where 


1   if   xk  <  -3/8  *  2 


-k 


sn  =  <  0       otherwise 
k    ^ 

1   if   x,  >  +3/8  •  2~k. 
V  K 

8.2  Choice  of  initialization  step 

The  initialization  of  this  algorithm  consists  of  a  very  small  (five 

value)  table-lookup  and  requires  storage  of  four  additional  precomputed  con- 

+   +        n     ±  1/2    ±  lA 
stants,  namely,  e     ,  e 
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TABLE  12 

X 

e 
0 

An  e 
0 

[|.  An2) 

.1/2 

1 
2 

LU»  2; 

eiA 

1 
1+ 

1 

0 

L  2,  u; 

e"lA 

1 

(-£n2,  -|) 

e"1/2 

1 
"2 

Since  E  =  1,  no  multiplication  is  required  to  form  E  =  e  .   Since 

x  =  x  -  in  e  and  #n  e  contains  at  most  one  non-zero  bit,  x  can  also  be 

formed  quite  easily. 

It  is  shown  in  the  next  section  that  such  an  initialization  leads 
to  a  convergent  algorithm. 

An  example  of  the  evaluation  of  e  is  given  below. 
Example :   If  x  =  0.6,  then  eX  =  1.82211880039051.  The  error  bound  derived  in 

Section  8.3  indicates  that,  for  M  =  kO,    the  error  should  be  less  than 

-12 
O.69  x  10   .  The  actual  error  in  the  approximation  produced  by  the  algorithm 

-12 
is  less  than  0.62  x  10 
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TABLE  13 


1 

0 

2 

1 

3 

0 

1+ 

0 

5 

-1 

6 

0 

7 

0 

8 

-1 

9 

0 

10 

0 

11 

0 

12 

0 

13 

-1 

Ik 

0 

15 

-1 

16 

0 

IT 

-1 

18 

0 

19 

0 

20 

1 

21 

0 

22 

0 

23 

0 

2k 

0 

25 

0 

26 

-1 

27 

0 

28 

-1 

29 

-1 

30 

0 

k+1 
0.10000000000000 

-0.01778303565638 
-0.01778303565638 
-0.01778303565638 
-0.002031+67868821+ 
-0.002031+67868821+ 
-0.002031+67868821+ 
-O.OOO0796I+3852UI+ 
-0.0000796U3852HU 
-0.0000796U3852U)+ 
-0.0000796U3852l^ 
-0.00007961+3852l|l+ 
-O.OOOOI8606833U7 
-0.0000186068331+7 
-0.00000331+792799 
-O.OOOOO33U792799 
0.O0O0OO1+6677655 
0.0000001+6677655 
0.0000001+6677655 
-O.OOOOOOOIOO60U9 

-O.OOOOOOOIOO60U9 
-0.0000000100601+9 
-0.000000010060l+9 
-O.OOOOOOOIOO60U9 
-0.000000010060l+9 
-0.00000000260991 
-0.00000000260991 
-0.00000000071+727 
0.00000000018405 
0.000000000181+05 


k+1 

1.6U872127070013 
1.85U811U295376U 
1.85U811U295376U 
1.85U811U295376U 
I.82583OOOO95IU 
I.82583OOOO95IH 
I.82583OOOO95IH 
1.82226392673051 
1.82226392673051 
1.82226392673051 
1.82226392673051 
1.82226392673051 
1.82215270U 56701 
1. 82215270456701 
1.82212U90072326 
1.822121+90972326 
1. 8221179^986838 
1. 8221179^986838 
I.822II79W6838 
1.82211881872192 
I.822H881872192 
1.82211881872192 
1.82211881872192 
1.82211881872192 
1.82211881872192 
1.82211880  51 1+608 
1.8221l8805ll+608 
1.82211880175212 
1.82211880005511+ 
I.822II88OOO551I+ 


(Continued) 
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TABLE  13   (Continued) 


Lk+1 


31 

1 

-0.00000000001*878 

32 

0 

-0.00000000001*878 

33 

-1 

0.0000000000091*3 

3>* 

0 

0.0000000000091*3 

35 

0 

0.0000000000091*3 

36 

1 

0.00000000000216 

37 

0 

0.00000000000216 

38 

1 

0.00000000000031* 

39 

0 

0.00000000000031* 

i*o 

0 

0.00000000000031* 

k+1 

1.8221188001*7938 
1.8221188001*7938 
1.82211880037332 
1.82211880037332 
1.82211880037332 
1.82211880038658 
1.82211880038658 
1.82211880038989 
I.822II88OO38989 
I.822H880038989 


8.3  Error  bound 

The  error   in  the  approximation 

M 
e       =     IT     e. 
i=0     x 

is  strongly  related  to  the  value  of  the  function  itself: 

leX-*Wil 


eXM+1  -  llE 


M+l 


where 


M 


Vi =  A  ei 


i=0 


Thus,  a  bound  on  E^  is  required  in  addition  to  a  bound  on  x^  .  In  pro- 
ducing a  bound  on  r  ,  two  preliminary  results  are  helpful.  First  recall 
the  power  series 
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&n(l  +   &)  =  6  -  1/2  52  +  1/3  63  -  l/k   6  +  ... 

[-1  <  5  <  1]. 

Then,  for  k  =  1,  2,  ... 

|in(l  +  2"k)|  <  |in(l  -  2"k)|.  (8-3) 

Next  it  is  shown  by  induction  that 

|in(l  -  2~k)|  <  3/2  •  2~k,   k  =  1,  2,  ....        (8-U) 

-k 
Observe  that  since  in(l  -  2  )  <  0,  statement  (8-10  is  equivalent  to 

-£n(l  -  2"k)  <  3/2  •  2~k,   k  =  1,  2,  ... 


or 


or,  exponentiating, 


in(l   -  2_k)  >   -3/2    •    2~k,        k  =  1,    2,    ... 


-k 
1   -  2~k>  e"3/2    *   2      ,  k  =  1,    2,    ...  (8-5) 


which  is   in  more   convenient  form  for  an  inductive  proof  than  is   (8-U). 
Proof:      Since  e     '      =  0.^7  <  1/2,    (8-5)    is  true  for  k  =  1.     For  some  k, 

1  -  2"k  >  e-3/2  •  2"k. 


Then  surely, 


1  -  2"k  +  1A   •  2"2k  >  e-3/2   •  2"k 


or 


(1  -  2-(*+D)2  >  e-3/2    •    2"k 


89 
or,  taking  positive  roots, 

1.2-(^).>e-3/2-2"(k+l). 

Q.E.D. 

Now  let  us  produce  a  bound  on  r  ,  For  the  initialization  chosen 
in  the  last  section,  one  may  easily  show  by  direct  computation  that 

x1  =   xQ  -  in  eQ  e   [-l/k,   +l/k) . 

/•,  -,  \ 

Hypothesis :   |x_  |  <  3/8  "  2  ^     for  all  k.  The  hypothesis  is  true  for  k  =  0 

since  |x  |  <  £n   2  <  $/h   and  true  for  k  =  1  since  |x  |  <  l/k  <   3/8.   The  in- 

th 
duction  proof  is  completed  by  considering  the  mapping  from  the  k   step  to 

st 
the  (k+l)   step. 

-k  -k  - 

Range  1:   Suppose  -3/U  -2   <  3c  <  -3/8  '  2  L  so  that  s  =  1;  then, 


w  =  \-«*-2"M> 


or. 


-3A  •  2"k  +  |in(l-2"(k+l))|  <  ^+1  <   -3/8  •  2"k  +  |in(l-2"(k+l))|. 
By  (8-10, 

|in(l-2-(k+l))|  <3A  '  2"k, 
and  by  a  power  series  expansion, 

|in(l  -  2-(*+l))|  =2-(k+l)+l/2  .2-2^K    ... 


or 


|in(l  -  2'(k+l))|  >  3/8  •  2"k. 
Thus,  for  Range  1, 
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-k 


K+1|<3/8-2 

Range  2:   Suppose  -3/8  •  2~k  <  x,  <  3/8  *  2~  so  that  s  =  0;  then 


-k 


K+1|  -  1^1  <  3/8  •  z-\ 

Range  3:   Suppose  3/8  •  2~k  <  x^  <  3/U  •  2~k  so  that  s  =  1;  then 


Xk+1 


=  x^  -  £n(l  +  2~(k+l)). 

From  (8-3)  and  (8-4), 

Jn(l  +  2"(k+l))  <  3A  •  2"k. 
But  from  the  power  series  expansion, 

m(i  +  s-<k+1)  >  2"(k+1)  -  1/2  •  2"2(k+1) 


or 


ei 


m(i  +  2"(k+l))  >  1/2  •  2"k  -  1/8  ■  2"k  •  2' 

which  for  k  >  1  yields 

in(Z.+   2"(k+l))  >  3/8  '  2"k. 
Thus, 


3/8  •  2"k  <  £n(l  +  2_(k+l))  <  3A  *  2' 


Therefore, 

K+1|  <  3/8  •  2"k 

for  this  range  of  x,  also. 
Hence,  for  k  =  0,  1,  2,  ..., 

I\J  *  3/8  •  2"k. 


Q.E.D, 


91 
Note  further  that 


:+i' 


<   IVJ    6 


so  that  the  error  in  the  approximation  to  e  is  less  than 


x  M 

IVll    e  (.ITnei) 

1=0 

where 


l^+il  <  3/8  •  2"M. 


For  any  reasonable  register  length,  say  M  >  10, 


K+il 

e        <  1.001, 


so  the  error  may  be  bounded  by 


or 


M  M  M 

eX  -  (  E  e.)|  <  3/8  '  2"M(1.001)  (  H  e.) 
1=0  X  i=0  X 


eX  -  VJ  <  0.376  ■  2"M  EM+1. 


By  direct  computation, 


M        /   M 

IT  e.  <  e1/2  n  (1  +  1/2  s.2"1) 


i=0  X       i=l 


<  e1/2  "  (1  +  2-^) 


1=1 

<  3.31. 


Thus  surely 
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W  <  l-^    ■    2""- 


But  since  E^   is  a  close  approximation  toe  ,  it  is  substantially  less  than 
3.31: 


\-   *" 


;+1  "  1  -  0.376  •  2"M  "  1  -  0.376  •  2'M 
E^+1<2(1+  1/2  •  2~M). 


Therefore, 


:•: 


•  E^J  <  0.376  •  2"M  ♦  2(1  +  1/2  •  2"M)  <  0.753  *  2"M 


for  M  >  10. 


Hence,  the  performance  of  M  +  1  steps  beyond  the  initialization 


x 


suffices  to  guarantee  M  correct  bits  in  the  approximation  to  e  . 


Q.k     Experimental  estimate  of  speed 

The  Monte  Carlo  estimate  of  the  mean  probability  of  a  zero  is  O.669, 
"with  a  corresponding  shift  average  of  3.0^. 


J: 


8.5  Implementation 

Making  the  usual  transformation,  u^  =  2  x  ,  such  that  |  tjl  |  <  1  for 
all  k,  and  rewriting  recursion  (8-l),  one  obtains, 


Vi 


=  2 


ok  /  „  /-,      -(k+l)x\ 
\+  2     (  -£n(l  +  sk2     0] 


uQ  =  x 


(8-6) 


with  recursion  (8-2)  remaining  unchanged. 
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\+i  -  \  +  sA2"(k+1)>     Eo  =  1'  El  =  V       (8"7) 

A  hardware  configuration  to  implement  these  recursions  is  shown  in  Figure  8. 

8.6  Concluding  remark 

v 

An  algorithm  for  evaluating  e  in  three  multiplication  cycle  times 

has  been  proposed.  The  algorithm  requires  storage  of  only  a  few  precomputed 
constants  not  required  by  other  algorithms;  the  hardware  required  for  imple- 
mentation fits  within  the  requirements  of  other  algorithms  previously  proposed, 
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9.   THE  ALGORITHM  FOR  TANGENT  (OR  COTANGENT) 

9.1  Basic  algorithm 

The  tangent  (or  cotangent)  of  any  argument  X,  0  <  X  <  2«,  may 
be  formed  in  two  multiplication  cycle  times  beyond  an  initial  range  reduction. 
From  the  standard  identity, 

tan(X  +  a)  =  tan  X 

it  is  clear  that  one  need  only  consider  arguments  in  the  range  0  <  X  <  it . 
Further,  from  the  identity, 

tan(X  +  jt/2)  =  -ctn  X 

it  is  clear  that  the  range  may  be  reduced  to  0  <  X  <  tt/2.  Hence,  the  initial 
range  reduction  requires  storage  of  the  values  jt,  it/ 2;   whether  one  stores  both 
values  explicitly  or  obtains  one  as  a  shifted  version  of  the  other  is  a  moot 
question. 

The  algorithm  proposed  here  to  evaluate  tan  X  (or  ctn  X),  0  <  X  <  it/2, 
requires  the  use  of  the  complex  exponential, 

eJ  =  cos  X  +  j  sin  X 

where  the  operator  j  represents  a  90°  counterclockwise  (positive)  rotation  in 
the  complex  plane.  The  normalization  procedure  is  conceptually  identical  to 
that  employed  in  the  division  algorithm,  except  that  the  multiplier  constants 
are  complex  numbers.  Let 

M 

H  t. 

JX    jX   i=0  x 
e   =  e 


M 

n  t. 
1=0  x 
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1 


where 


Then, 


and 


J<Pk 

t.    =     t,    e        ,        t.      >  0. 
k        '    k1  '      '    k1 


M 

ir    t.  =  (  n  It  !)e  1=0 

i=0     x  i=0     1 


M  M 

j(X  -  E  cp  )  IT     t 

jX  i=0  i=0 

e        =  e 


M 

TT      It.  | 
i=0       X 


If  the  multipliers   are   chosen  such  that 


then 


where 


M 

x  -  E    cp.  =  o 
i=0     x 


M 
e^X  =   cos  X  +  j    sin  X  =  K  n     t. 

i=0 


K 


M 

n 

i=0 

It.  1 
1  i ' 

Hence,    if 


M 

III     !       S     S.I     !       +     J      III!      =  n  t- 

M+l  M+l        °      M+l        .    _      i 

1=0 


Wlth  Vl'    Vl  rSa1'    then 


M+l 
tan  X  = 

RM+1 
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ctn  X  = 


% 


+1 


M+l 


independent  of  K. 

For  simplicity  sake,  assume  that  the  tangent  is  required.  The 
range  of  the  operand  may  be  further  limited  to  0  <  X  <  jt/U  as  follows: 


(1)  If  0  <  X  <  it/k,    let  x  =  X  and 

,   Y  ,    ~  hi+i 

tan  X  =  tan  x  = ; 

RM+1 

(2)  If  it  A  <  X  <  jt/2,  let  x  =  tt/2  -  X  and 

R. 


tan  X  =  ctn  x  = 


M+l 


"Wl 


Having  performed  the  indicated  range  reductions,  one  need  only  formulate  an 

ix 
algorithm  to  compute  the  real  and  imaginary  parts  of  e  ,  a  final  division 

being  required  to  evaluate  the  tangent. 

Four  constraints  are  placed  on  the  set  of  multipliers:   (l)  the 
summation  of  the  angles  must  approach  x;  (2)  the  magnitudes  must  be  non- 
zero; (3)  the  continued  product  of  the  multipliers  must  be  easy  to  compute 
in  rectangular  coordinates;  (h)     approximately  two-thirds  of  the  multipliers, 

on  the  average,  should  be  1  +  jO,  if  possible.  The  following  choice  for  the 

th 
k   multiplier  satisfies  these  constraints: 


tk  =  1  +  j  1/2  3^2 


-k 


where 


if 


Sk  = 


3^  <  -3/8  •  2 
otherwise 


-k 


if   ^  >  3/8  '  2" 
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k 
*k+l  =  xo  "  A  V    xo  =  x* 


i=0 

Observe  that  one  need  not  compute 

1 


K  = 


M 

IT   It. I 
i=0  '  x 


since  it  is  merely  a  scale  factor  which  disappears  during  the  final  division. 

With  the  exceptions  of  a  few  additions  required  for  the  initial 
range  reductions  and  the  final  division,  one  need  only  perform  the  recursions 
developed  below. 

k 

x.    =  xn  -  Z  (p.  X_=  X 

K+l     0    .  _  Yl  0 

1=0 


*k 


r-n, 


,   -1,    -(k+lK 
=  x^  -  tan  (sk2     J) 


=   ^  -  sk  tan"1(2-(k+l))  (9-D 


i.e., 


Tk+1  ■  \+l  +  J  h.i  To  =  x  +  J0 


n  t. 
i=o  x 


=  (Rk+  j  Ik)(l+  j  sk2"(k+l)) 


Vi  =  \-W(k+1)  ^ 
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The  set  of  constants,  {tan  (2   )}  ,  must  be  precomputed  and  stored  in  the 
ROM.  A  series  expansion 

tan"1  5  =  6  -  1/3  &3  +  l/5  65  -  l/7  6T  +  . . . 

[|B|  <1] 

indicates  that  only  one-third  of  these  constants  need  be  explicitly  stored 
since,  for  k  >  [M/3],  tan  (2  )  =  2   to  machine  accuracy. 

9.2  Choice  of  initialization  step 

At  the  cost  of  storing  one  additional  precomputed  constant,  one 
may  choose 


tQ  =  1  +  j  tan(jt/8) 


so  that 


x  =  x  -  jt/8  e  [-n/8,   +it/8) 


R1=l 

I  =  tan(it/8), 

the  value  tan(jt/8)  being  stored. 

An  example  of  the  normalization  procedure  and  the  step-by-step 

values  of  R  and  I  is  listed  in  Table  1^.   Presumably  the  division  algorithm 

is  used  to  compute  the  ratio  L„  n/R„  .,  to  produce  the  tangent. 
^  M+r  M+l 

Example:   If  x  =  0.6,  then  tan  x  =  O.68U1368083U169.  The  algorithm  produces 

R^   =  0.921308710261+29,  I    =  O.6303OI2OO5377U.  The  ratio  is 

-12 

O.68U1368083U183,  which  differs  from  the  correct  value  by  O.lU  x  10 

-12 

This  error  is  within  the  error  bound  of  0.68  x  10 
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9.3  Error  bound 

-k 
It  is  first  shown  that,  for  k  >  2,   |xj  <  3/h    •  2  ,  and  that  if 

u,  =  2  x_  then  |  u.  |  <  1  for  all  k. 

Recall  that  the  range  reductions  resulted  in  |xn|  <  n/h   so  that 

|un]  <  1.  Also,  since  |x  |  <  fl/8,  |u  |  <  1.  Consider  the  choice  of  t  • 

t1   =  1  +  l/U  s 

cp,  =  s,  tan  (lA) 

=  O.2UU98  s. 


1 


where 


1   if   x,  <  -3/l6 


s  =  <  0       otherwise 


1   if   x  >  3/l6. 
Range  1:   Suppose  -  n/Q   <  x  <  -3/16  so  that  s  =  1.   Then 


X2  =  Xl  "  ^1 


=  x±  +   tan_1(l/U) 


so  that 

|x2|  <  0.16  <  3/16 

,|  <  0.6^  <  1. 


u. 


Range  2:   Suppose  -3/16  <  x  <  3/16  so  that  s1  =  0;  then  |x  |  =  |x1|  <  3/16 

and  I u2 I  <  1. 

Range  3:   Suppose  3/16  <  x  <  it/8  so  that  s1  =  1.  By  symmetry,  |x2|  <  3/l6, 

lu2l  <  1- 
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Hypothesis:  For  k  >  2,    |xj  <  3/U  •  2~k,  |uj  <  1. 

Proof:  The  hypothesis  has  been  shown  to  be  correct  for  k  =  2.  The  proof  is 

completed  by  induction. 

Range  1:   Suppose  -3/U  •  2   <  x  <  -3/8  •  2~  .  Then, 

"hi  ■  \ +  *-"V(k+1)) 

and 

-3/U.  2-k  +  tan-1(2-(k+l))  <  x^  <  -3/8  •  2~k  +  tan'V^"^) . 

Now, 

tan'V(k+l))  =  2"(M)  -  1/3  •  2-3<k+1>  +  l/5  •  2"5(k+l)  -  ... 


so 


2"(k+l)  -  1/3  •  2"3(k+l)  <  tan-V(k+l))  <  1/2  •  2"k. 


Since  k  >  2, 


or 


Thus 


or 


2"k(l/2  -  1/1536)  <  tan"1(2"(k+l))  <  l/2  •  2 


767/1536  '  2"k  <  tan_1(2"(k+l))  <  l/2  •  2~\ 


2"k(-3A  +  767/1536)  <  xk+1  <  2-k(-3/8  +  1/2) 


-385/1536  •  2"k  <  xk+1  <  1/8  •  2"\ 


Hence 


10k 


here. 


Range  2:   Suppose  -3/8  •  2~  <  x,  <  +3/8  •  2~  .  Then  s  =  0  and 
Ix,   I  <  3A  '  2"(k+l) 

Range  3 :   The  proof  for  this  range  follows  that  above  by  symmetry  consider- 
ations.  Hence,  |x  |  <  3/U  ■  2~  ;    also  |uj  <  1.  Finally,  |x„  1|  <  3/8  ■  2"  . 

While  the  error  in  the  normalization  process  has  been  bounded  by 

-M 
3/8  •  2  ,  it  is  not  clear  that  the  values  of  cos  x  and  sin  x  are  known  to 

such  precision.  An  approximate  error  bound  on  the  error  in  tan  x  is  developed 

below. 

First   observe  that 

jx       __     JXM+l/_  .   _        n 

e        =  K  e  (R^  +  3   3^) 

so  that,  equating  real  and  imaginary  parts, 

K  Vl  =  C°S(X  "  W 

K  W  =  sin(x  "  W' 

where  K  R..  ,  is  the  approximation  to  cos  x  and  K  L,,  .  is  the  approximation 
M+l        ^  M+l 

to  sin  x.  The  error  in  the  approximation  to  tan  x  is  thus  given  by 

E^     =  tan  x  -  tan  (x  -  x.  _ ) 
tan  x  M+l 

tan  x  -  tan  x^  1 
=  tan  x  - 


*M+1 


1  +  tan  x  tan  x^  .. 
2 


tan  x,,  n    sec  x 


1  +  tan  x  tan 


*M+1 
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Since  0  <  x  <  n/k,      1  <  sec  x  <  2  and  0  <  tan  x  <  1.  Therefore, 

,-    2  tan  l^ll 
1  tan  x1    1  -  tanjx^^^  | 

But  |xj.  1|  <  3/8  •  2~M,  so,  to  first  order,  |tan  x^  _|<  3/8  •  2~M  and 

lEtan  ,1  2  3A  •  2'M. 

When  the  value  of  X  exceeds  it/U,   x  =  jt/2  -  X,  and  one  computes 
ctn  x  rather  than  tan  x.  The  analogous  error  bound  is  given  by 


1  +  ctn  x  ctn  x^ 
E  ,    =  ctn  x  +  - 


ctn  x  ctn  x  -  ctn 


2 

CSC  X 


*M+1 


ctn  x  -  ctn  xM+1 
2  tan  x^        esc  2x 


tan  x^   -  tan  x 

Therefore,  to  first  order, 

2 1  x„T  _  I  esc  2x 


ctn  x1    tan  x  -  x.„  n 
1         M+l1 

It  may  be  observed  that  as  x  approaches  zero  (X  approaches  «/2)  the  error 
increases  without  bound;  such  behavior  is  to  be  expected  since  tan  X  itself 
increases  without  bound.   If  x  »  |xM  -,\,   then  the  error  may  be  approximated 

by 

2 


E 


ctn  x 


I  =  Ivil  csc  x>     x>>  IVil 


which  is  seen  to  approach  3/h    •  2   as  x  approaches  n/h,   matching  the  error 
bound  for  the  tangent. 


106 

If  one  wishes  to  achieve  a  small  error  in  the  case  where  X  -»jt/2, 
(x  -*0),  it  may  be  best  to  turn  to  a  power  series  expansion  such  as 

ctn  x  =  l/x  -  1/3  x  -  I/U5  x3  -  2/9^5  x5  -  ... 


2n  2n-l 

Ts^Ti — x 


[|x|  <«] 


where  B  is  a  Bernoulli  number.  When  x  is  very  small,  the  first  few  terms 
in  such  a  series  suffice  to  yield  an  acceptably  small  error.  For  example, 
if  x  <  ~\/l/3  *  2~^M+-l-<',  then  ctn  x  =  l/x  to  machine  accuracy. 

Similarly,  if  the  tangent  of  X  is  required  and  X  is  sufficiently 
small,  it  would  be  better  to  use  a  power  series  expansion  here  also. 

tan  X  =  X  +  1/3  X3  +  2/15  X5  +  

[|X|  <  jr/2] 


If  X  <  ~\/3  •  2      ,  then  tan  X  =  X  to  machine  accuracy. 

9.h     Experimental  estimate  of  speed 

The  normalization  of  the  operand  x  and  the  parallel  evaluation  of  the 
approximations  to  cosine  and  sine  require  approximately  one  multiplication 
cycle  time.  The  Monte  Carlo  estimate  of  the  mean  probability  of  a  zero  for 
this  process  is  O.653?  with  a  corresponding  shift  average  of  2.90.   In  addition, 
as  many  as  three  range  reduction  subtractions  are  required  to  obtain  x.  A 
final  division  to  evaluate  either  tan  x  or  ctn  x  is  also  required.  Hence,  an 
average  of  slightly  more  than  two  multiplication  cycle  times  is  required  to 
evaluate  tan  X. 
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9.5  Imp  leme  nt  at  i  on 


Figure  9  shows  a  hardware  configuration  to  implement  recursion 
relations  (9~l),  (rewritten  in  terms  of  u=  2  x  ),  (9-2),  and  (9-3). 

9.6  Concluding  remark 

An  algorithm  for  computing  tan  X  (or  ctn  X)  in  approximately  two 
multiplication  cycle  times  has  been  developed.   It  has  been  shown  that,  for 
certain  values  of  X,  it  may  be  advisable  to  consider  a  power  series  expansion 
as  an  alternative  to  minimize  the  error. 
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10.   THE  ALGORITHM  FOR  COSINE  AND  SINE 

10.1  Basic  algorithm 

The  algorithm  for  computing  the  cosine  and  sine  of  an  argument  X, 
0  <  X  <  2jt,  differs  from  the  algorithm  for  tangent  in  only  two  respects. 
First,  the  constant  K  must  be  computed  since  the  cosine  and  sine  are  ex- 
plicitly required,  rather  than  merely  their  ratio;  second,  the  final  division 
is  unnecessary.  The  chief  concern  here  is  thus  the  evaluation  of  K. 

Recall  that 


K  = 


n  a/i  +  s2  2^^ 


i=0 


AM 


—    -* 
Clearly,  if  the  algorithm  is  performed  non-redundantly  with  s  e  [1,  1} , 

then  K  can  be  precomputed  and  stored.  Such  a  choice,  however,  guarantees 
that  at  every  step  an  addition  must  be  performed,  thus  slowing  down  the 
algorithm  considerably.  A  compromise  between  Specker's  non-redundant  al- 
gorithm and  the  fully  redundant  (s  e  {1,  0,  1},   k  =  1,  2,  ...,  M)  algorithm 
one  would  like  to  perform  is  developed  here. 
Consider  an  expansion  of  l/|t  | : 

1  1 


K\ 


k1 


A/l  +  si  2-S^ 


.  1  .  a  2-(2k+3)  2  2-(Uk+6) 

k  k 

2  -(Uk+T)  .  .  .     -   . 

+  s  2  +  higher  order  terms. 

K 


*  Such  an  algorithm  was  proposed  by  Specker. 
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For  k  <  (—)—),   at  least  three  terms  in  the  expansion  are  required  to  repre- 
sent l/lt,  |  to  machine  accuracy;  it  is  preferable  to  disallow  s  =  0  and 
precompute  the  constant  required.   However,  for  k  >  (— r~) ,    it  is  preferable 
to  allow  s,  =  0;  the  computation  requires  a  simple  correction  factor.  For 

(^)  <  k  <  <**), 

M 

to  machine  accuracy.  For  k  >  (~^) f 


1 

=  1 


K\ 


to  machine  accuracy  and  no  correction  factor  is  necessary. 

In  summary,  then,  the  following  selection  rules  are  proposed: 

*  In 

t   =  1  +  j  tan  jt/o 

t   =1+^1/2  sk2_k,   k  =  1,  2,    ...,   M 

where 


(1)   for  k  =  1,  2,  ...,  fij-]   +  1, 


*  The  constants 


K1      =  IT 

|t0|       i=o 


(1  +  2-2(i+1))-1/2 


[—1+1 

k1  ■  = n    (1  +  2    v      ')    ' 


*ol      1=0 


are  precomputed  and  stored  and  serve  as  initial  values  R,  and  1^,  respectively. 


Ill 


sk  = 


if 


if 


*kS 


<  0 


*k 


>   0 


(One  addition  cycle  time  is  required  per  step.) 

2 


(2)   for  k  =  [^]+2,  ...,   ^]+l, 


1   if   xk  <  -3/8  •  2 
(0       otherwise 
1   if   x^  >  3/8  '  2 


-k 


-k 


-i-4  2-(a+3) 

k 


k1 


(Two  addition  cycle  times  are  required  if  s  /  0,  leading  to  an  average  of 
two-thirds  addition  cycle  times  per  step.) 
(3)  for  k  =  [~2]  +  2,  ...,  M,   sk  as  above, 


=  1. 


(A  single  addition  cycle  time  is  required  if  s  /  0,  leading  to  an  average 
of  one -third  per  step.) 

Since  roughly  one-quarter  of  the  steps  are  non-redundant,  one- 
quarter  are  redundant  but  require  two  additive  cycle  times  per  non-zero  step, 
and  one-half  are  redundant  and  require  one  addition  cycle  time  per  non-zero 
step,  approximately 

lA  M  +  lA  '  2  •  M/3  +  1/2  •  M/3  =  7/12  M 


addition  cycle  times,  on  the  average,  beyond  initial  range  reduction,  if  any, 
are  required  to  compute  the  cosine  and  sine. 
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Other  choices  for  the  set  of  multiplier  constants  {t.)  were  studied 
with  the  conclusion  that  the  set  chosen  appears  to  be  near  an  optimum.  A 
brief  discussion  of  the  reasoning  leading  to  this  conclusion  is  presented 

in  Appendix  B.  It  may  be  recognized  that  the  problem  faced  here  is  identical 

2 
to  that  faced  in  attempting  to  introduce  redundancy  into  Voider 's  CORDIC 

vector  rotation  scheme. 

10.2  Example 

Having  chosen 

tQ  =  1  +  j  tan(jt/8), 

so  that     cpn  =  ir/8,     R     =  K' ,      I     =  K» ' ,    it   is  possible  to  carry  out  an 

example . 

Example ;   If  x  =  0.6,  then  cos  x  =  0.82533561^90968  and  sin  x  = 

O.56J+6U2U733950U.  The  algorithm  produces  approximations  which  are  in  error 

_V2  -12 

by  O.k   x  10    and  0.1  x  10   ,  respectively.  The  error  bound,  derived  in 

-12 

Section  10.3,  for  both  cosine  and  sine  is  O.U5  x  10 

TABLE  15 


^k+1 


0 

0 

0.20730091830128 

1 

1 

-0.0376777^82559 

2 

-1 

0.0866772^972117 

3 

1 

0.02U258U3972522 

k 

1 

-0.00698139370505 

5 

-1 

0.0086U2331+9151+2 

6 

1 

0.00082999385532 

7 

1 

-0.0030762362766^ 

8 

-1 

-O.OOII23II376OI6 

k+1 

0.88706U17837978 
0.79520567503^72 
0.86885568228162 
0. 8382^322299^37 
0.8212U000959630 

0.8301 5091 WU06 

0.82579571H91+79 
0.82359277522U76 
0.82U70051+353106 


k+1 

0.367U3U01338025 
0.58920005797520 
0.1+897993W59586 
0.5UU102828738U6 
0.570297929^5703 
0.557^6605^30709 
0.56395160832853 
0.56717737282538 
0.56556879318627 


(Continued) 
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TABLE  15  (Continued) 


k+1 


k+1 


'k+1 


9 

-1 

-0.00014655157061 

0.82525285680565 

0.56476342156173 

10 

0 

-0.00014655157061 

0.82525285680565 

O.56U763U2156173 

11 

0 

-0.00014655157061 

0.82525285680565 

0.564763^2156173 

12 

-1 

-0.00002448125871 

0.82532179150388 

O.56U662678U805U 

13 

0 

-0.00002448125871 

0.82532179150388 

0.56466267848054 

Ik 

-1 

0.00000603631940 

0.82523902325696 

0.56463749139536 

15 

0 

0.00000603631940 

0.82533902325696 

0.56463749139536 

16 

1 

-0.00000159307513 

0.82533471539075 

0.56464378821596 

IT 

0 

-0.00000159307513 

0.82533471539075 

0.56464378821596 

18 

-1 

0.00000031427351 

0.82533579236181 

0.564 64221401389 

19 

0 

0.00000031427351 

0.82533579236181 

0.56464221401389 

20 

0 

0.00000031427351 

0.82533579236181 

0.56464221401389 

21 

1 

0.00000007585493 

0.82533565774061 

0.56464241078928 

22 

0 

0.00000007585493 

0.82533565774061 

0.56464241078928 

23 

1 

0.00000001625028 

0.82533562408530 

0.56464245998312 

24 

0 

0.00000001625028 

0.82533562408530 

0.56464245998312 

25 

1 

0.00000000134912 

0.82533561567147 

0.56464247228158 

26 

0 

0.00000000134912 

0.82533561567147 

0.56464247228158 

27 

0 

0.00000000134912 

0.82533561567147 

0.56464247228158 

28 

0 

0.00000000134912 

0.82533561567147 

0.56464247228158 

29 

1 

0.00000000041780 

0.82533561514561 

0.56464247305023 

30 

1 

-0.00000000004786 

0.82  533  561 1*88268 

0.56464247343456 

31 

0 

-0.00000000004786 

0.82533561488268 

0.56464247343456 

32 

0 

-0.00000000004786 

0.82533561488268 

0.56464247343456 

33 

-1 

0.00000000001034 

0.82533561491554 

0.56464247338652 

34 

0 

0.00000000001034 

0.82533561491554 

0.56464247338652 

35 

0 

0.00000000001034 

0.82533561491554 

0.56464247338652 

36 

1 

0.00000000000307 

0.82  533  56l 491144 

0.56464247339252 

37 

1 

-0.00000000000057 

0. 82533561 U90938 

0.56464247339552 

38 

0 

-0.00000000000057 

0.82533561490938 

0.56464247339552 

39 

0 

-0.00000000000057 

0.82533561490938 

0.56464247339552 

ko 

-1 

-0.00000000000011 

0.82 533 561 U9096U 

0.56464247339515 

uk 


10.3  Error  bound 


It  is  shown  that  the  errors  in  the  approximations  to  both  cosine 
-(M+l) 


and  sine  are  less  than  2 


-1/ 


From  the   choice  of  tQ,    |x   |    <  tt/8  <  tan"   (l/2).     For 


i  <  k  <  [Hj£]  +  i, 


where 


During   these  steps, 


\=   Sk 


tan-1(2'(k+l)) 


1   if 


k" 


\^ 


<   0 


1   if   xk  >  0. 


x,  I  <  tan_1(2"k). 
k1  — 


-1 


Proof:  For  k  =  1,   |x, |  <  tan  (l/2)  as  indicated  above.  The  inductive 
proof  that  |x^|  <  tan  (2  )  is  based  on  the  observation  that 

-1/  -kv      -1/  -(k+l)x  .  .   -1,  -(k+lK 
tan  (2  )  -  tan  (2      )  <  tan  (2      J 


i.e. , 


tan_1(2"k)  <  2  tan'1^  ■  2_k) 


This  observation  follows  from  the  power  series 


-1 


tan"  8=8-1/36+  l/5  6  -  l/7  8  +  .. 


[|5|  <  1] 


Since  for  some  k, 


jx^l  <tan"1(2"k) 
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and  since,  from  the  selection  rules, 


1*^1  <tan-1(2-k)  .tBV"1'), 


it  follows  from  the  above  observation  that 


|Vl|  Stan-^a"^1)). 


-M-6 


Thus,  if  t  -   [— r—  ]  +  1,  at  the  end  of  this  sequence  of  steps 


x   |  <  tan"1(2"(t+l)). 


For  t  +  1  <  k  <  M, 


\  =   Sk 


tan-1(2^k+1)) 


where 


Sk  = 


if 


\  <   "3/8  •  2 
otherwise 


-k 


1   if   xv  >  3/8  •  2' 


During  these  steps  also, 


|xJ  <tan-1(2-k). 


Proof;   From  above,  |x  , |  satisfies  the  hypothesis.  The  induction  proof  is 

completed  in  the  usual  fashion. 

-1  -k  -k 

Range  1;   Suppose  -tan  (2  )  <  xv  <  -3/8  *  2  ;  then 

,   -1,  -(k+l)x 

=  x  +  tan   (2  v     )  and 


lc 


k+1 


-tan-Vk)  +  tan-1(2-(k+l))  <  xv  _  <  -3/8  •  2"k  +  tan-1(2-(k+l)). 


k+1 


Since 
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and 


tan'V)  -  tan-1(2-(k+1))  <tan-1(2-(k+1') 


tarfV*)  -  3/8  •  2"k  <  tan_1(2_(k+l)) 


then 


l^^tan'V^15). 

Range  2:   Suppose  -3/8  •  2_k  <  x^  <  3/8  ■  2~k;  then 

\\+1\    <   3/8  •  2"k  <  tan"1(2"(k+l)). 

~k  -1   -k 

Range  3:   Suppose  3/8  •  2   <  x,  <  tan  (2   ).  The  argument  for  this  range 


follows,  by  symmetry,  from  that  for  Range  1. 


Q.E.D. 


The  approximation 


J  *M+1  ~  .    ._ 
e       =  1  +  jO 


is  made  with  an  error  of  magnitude  less  than 


1=0  '  1 ' 


or  less  than 


^+1' 


i=0  V 


Hence,  the  errors  in  the  values  of  cosine  and  sine  are  less  than  tan  (2      ) 

-(M+l) 


which  is  less  than  2 


117 


o 

i/)« 


K 

UJ 

H 

u>3- 

o<* 

UJ 

DC 

0 
K. 


cr 

2 

UJ 

•— ' 

O* 

H- 

Q 
O 

< 

_l 
_J 

U. 

fen 

ILU 
tf)Z 

AC 

O 
o 

I  S  I 

I  Q  i 

I  S  I 

I  1  I 


cO 


^ 


I      2 

I 


i-tr 

1  b_ 

— 

UJ 

to  j/ 
55 

UJ 
DC 

+ 


CM 


H 
I 


a) 

-P 

i 


OJ 


^ 


H 

rH 

+ 

+ 

X 

^ 

CVJ 


« 


CVJ 


CVJ 


^ 


K 


H 

+ 


K 


H 

+ 

H 


118 

10 . k     Experimental  estimate  of  speed 

Counting  a  non-zero  step  in  the  second  quarter  of  the  algorithm 
as  two  addition  cycle  times,  the  Monte  Carlo  estimate  of  the  mean  probability 
of  a  zero  is  O.U65,  with  a  corresponding  shift  average  of  1.88. 

10.5  Implementation 

Except  that  the  control  is  complicated  somewhat  by  changing 

operation  modes  in  the  midst  of  the  algorithm,  and  that  during  the  second 

quarter  of  the  algorithm  a  "multiplication"  of  R  and  I  by  the  factor 

K       K. 

(l  -  2       )  may  be  required  at  the  completion  of  a  non-zero  step,  the 
implementation  discussed  in  Section  9*5  suffices. 

10.6  Concluding  remark 

It  may  be  preferable  to  perform  the  entire  first  half  of  the  al- 
gorithm in  the  non-redundant  mode.   If  this  were  done,  it  would  require 
approximately  2/3  M  as  compared  to  7/l2  M  addition  cycle  times,  beyond 
initial  range  reduction,  to  compute  both  cosine  and  sine. 
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11.   THE  ALGORITHM  FOR  ARCTANGENT 


11.1  Basic  algorithm 

Formulating  a  normalization  scheme  required  to  force  an  arbitrarily- 
large  operand  to  zero  in  a  few  simple  steps  seems  a  rather  formidable  task, 
but,  in  this  case,  it  is  not  difficult.  A  common  trigonometric  identity  is 
useful. 


-1 


tan   X  =  Tt/h  +   tan 


-llx-i 

\X+1 


where 


X  =  x  •  2 


a 


X  6  [1/2,  1) 

OL   integer. 


Let 


8.   / 


0  if  a  <  o 

1  if  a  >  0 


and  consider  the  following  identity. 


-sa 


2    ! (X+l)  +  j(X-l) 


r  2-sq:(x+i)  +  j  2-sa(x-D 


M 

n  t, 

i=0  " 


M 

IT  t. 
1=0 


where  the  set  of  multipliers  {t.}  is  of  the  same  form  as  that  used  in  the 
cosine/sine  and  tangent  algorithms.  Taking  logarithms  on  both  sides  yields, 

in(2_Sa)  +  to[-^(X4-l)2  +  (X-l)2)+  j  tan"1^) 

M  M 

=  l0g(TM+l)  "  Z  in|ti'  "  J  E  ^i 
+     i=0         i=0 
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120 


where 


T. 


M+l 


=  *M+1  +  J  I 


M+l 


2_SQ;(X+1)  +  j  2_SQ;(X-1) 


M 

ir  t. 


J  i=0 


and 


log(Tk+1)  =  MAj^   +  I^+1)  +  j  t-"^)- 


Equating  imaginary  parts,  one  obtains, 


.      -i/x-i\       .      -l/Wl        8 
tan     krT     =  tan     I- -     L     cp. , 

\X+1/        \RM+l'    i=0  X 


Hence,  the  desired,  relation  is  obtained. 


tan 


X  =  n/k   +  tan" 


-1 


M+l 


\\. 


M 

E  cp. 


+r   i=o 


>-k 


Although  t,  -  1  +  j  l/2  s.  2   as  before,  the  selection  rules  for  s,  are  now 
k  k  '  k 

chosen  such  that  L^  t/R**  -1  is  very  nearly  zero,  so  that  one  may  approximate, 


-1 


M 


tan   X  =  n/k   -  Z  cp.  . 

i=0  X 


Three  things  should  be  noted  carefully:   (l)  the  required  set  of  stored 
constants,  {tan  (2      )},  is  already  required  by  other  algorithms  in  the 
set;  (2)   although  a  division  has  been  indicated  conceptually  in  the  trans- 
formation outlined  above,  no  actual  division  need  be  performed- -one  merely 
sets 

R00  -  2-flB(Xrt) 
I0Q  .  2-Sa(X-l)) 


which  requires  less  than  one  addition  cycle  time,  (this  is  only  the  first  part 
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of  the  initialization);  (3)  the  algorithm,  when  implemented  in  hardware,  is 
virtually"  the  same  as  the  algorithm  for  tangent,  the  only  significant  dif- 
ference lying  in  that  the  quantity  being  normalized  (indirectly)  is  the  ratio 

/x-i\ 

rr-r-  ,  rather  than  X  itself.  Comparisons  to  choose  each  s  are  performed 
with  respect  to  I,  ,  and  it  is  shown  that  \l^   -i/^™  1  I  i-s  very  small,  so  that 


tan 


\RM+1/ 


«! 


and  the  former  term  can  be  neglected.  Never  is  the  ratio  IM  -./R^  -,  actually 
computed;  rather,  it  is  shown  that  |l    |  <  3/8  *  2  ,  whereas  R^   >  3A. 

As  in  the  other  trigonometric  algorithms,  three  recursion  relations 
are  required. 

Vi  -  \  -  \  ^-^ 

\+i  -  \  -  \\  2"(k+1)  <^ 


where 


(l        if        I     <  -3/8    •    2 


) 

s,    =    {     0  otherwise 

k        ^ 

1        if       I     >  +3/8    *    2~k. 

Understanding  of  the  details  of  initialization  and  error  bound 
proof  is  facilitated  by  discussing  separately  two  cases:   (l)  S  =  0, 
0  <  X  <  1;  (2)   S  =  1,   1  <  X  <  00. 

11.2  Choice  of  initialization—error  bound:   Case  1 

The  first  part  of  the  initialization  consists  of  the  choice  of 
S  =  0  and  the  setting  of  values 
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R00  "  x  +  x>  Roo  e  [1>  2) 

Ioo-1-1'       '      Ioo€[_1'0)- 

The  second  part  of  the  initialization,  for  this  case,  consists  of  a  simple 
scaling  of  magnitudes. 

!3/U   if   0  <  X  <  l/k 
5/8   if  l/k  <  X  <  1/2 
1/2   if   1/2  <  X  <  1. 

Note  that  <p   =  0  and  that  R  =  R  t   ,   I  =  It,  can  be  formed  in  one 
addition  cycle  time.  The  third  and  last  part  of  the  initialization  consists 
of  beginning  the  recursion  counter  at  k  =  0,  rather  than  k  =  1  as  in  previous 
algorithms.   (Considering  this  step  as  part  of  the  initialization  avoids  the 
change  in  notation  that  would  otherwise  be  implied.)  By  direct  computation, 
it  may  be  shown  that  RQ  e  [3A,  l),   IQ  e  [-3 A,  0)  e  [-3 A,  3A),  which  is 
sufficient  to  lead  to  a  convergent  algorithm. 

Hypothesis:   For  k  >  0, 

-3A  '  2"k  <  Ik  <  3A  •  2~k 
3A  +  f (k)  <  \  <  1  +  2f(k) 

where 

f(k)  =  3A  S  |s,  J2"21,    s   =0. 
i=0 

Proof:   It  was  shown  above  that  the  hypothesis  is  true  for  k  =  0.   The  in- 
duction proof  is  completed  by  considering  the  three  ranges  of  I,  . 
Range  1:   Suppose  ~3/k    '   2~k  <  I  <  -3/8  *  2_  so  that  s.  =  1  and 


123 

Vi  ■  Rk  -  v"(k+1) 

L    .   =  L    +  Rt2"(k+1) 

k+1  k         k 


then, 


3A  +  f (t)  +  3/16  •  2"2k  <  Rk+1  <  1  +  2f(k)  +  3/8  •  2"2k 

-3A  -2"k+(3A  +  f(k))  l/2.2_k<T    <  -3/8-  2"k  +(l+2f(k))l/2.2"k 


k+1 


thus, 


3/h   +  f (k+1)  <  Rk+1  <  1  +  2f (k+l) 
-3/8  •  2"k  +  1/2  •  2_kf(k)  <  T    <  1/8  •  2"k  +  2_kf(k) 

K+-L 


Since  0  <  f(k)  <  l/k, 


3/k   +  f(k+l)  <  R    <  1  +  2f (k+l) 


k+1 
-3/8  •  2~k  <  Ik+1  <  +3/8  •  2"k 

as  desired. 

Range  2:   Suppose  -3/8  •  2~k  <  I  <  +3/8  '  2~k  so  that  sv  =  0  and  K.     ,  =  R,  , 

I    =  I  .   Since  when  s  =  0,  f(k+l)  =  f(k),  the  hypothesis  continues  to 


hold  in  this  range  also. 

p"k^n 
k 


Range  3:   Suppose  3/8  •  2_k  <  I  <  3/1+  •  2~k  so  that  s  =  1  and 


then, 


Rv   =  R     +  I.  2"(k+l) 
k+1    k    k 

T     =  I   -  R  2-(k+l) 
k+1    k    k 


3/U  +  f (k)  +  3/16  •  2"2k  <  Rk+1  <  1  +  2f (k)  +  3/8  '  2'2k 


12U 


-k 


3/8  -2'K-(l  +  2f(k))l/2  -2~k<  I    <  3A*2"k-(3A+f(k)),  1/2- 2"k 


"k+1 


thus 


3/U  +  f(k+l)  <  R^+1  <  1  +  2f (k+1) 
■1/8  •  2~k  -  2'k  f(k)  <  Ik+1  <  3/8  •  2'k  -  1/2  f(k)2 


But   0  <  f(k)  <  1/k,      so 


3/h   +  f (k+1)  <  Rk  1  <   1  +  2f (k+1) 

-3/8  •  2"k  <  Ik+1  <  3/8  *  2"k 


as  desired. 

Therefore  after  M  steps, 


.  „-M 


|lM+1l<3/8'2 
RM+1  -  3/k  +   f (M+1)  -  3/U 


so  that 


and 


^M+l,  <  2-(M+l) 


** 


+1 


Wl/ 


M+l 

The  error  in  the  algorithm  is  thus  small  enough,  in  this  case,  to  guarantee  M 
correct  bits  in  the  approximation  to  tan   X. 

Example :  As  seen  in  Table  l6,  the  choice  of  an  operand  X  =  0.6  leads  to  a  very 
quickly  convergent  algorithm  as  an  approximation  accurate  to  ik   decimal  places 
is  produced  in  only  two  steps  beyond  the  initialization. 
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11.3  Choice  of  initialization — error  bound:   Case  2 

The  first  part  of  the  initialization  consists  of  the  choice  of  S  =  1 
and  the  setting  of  values 

RQ0  =  2"a(X+l)  =  x  +  2"a,  RQ0  e  (l/2,  3/2) 

I0Q=  2"a(X-l)  =  x  -  2a,      IQ0  e    [0,  1). 


The  second  part  of  the  initialization  consists  of  choosing  t 


01' 


'01 


1    if   1  <  X  <  3/2 
1-jl   if   x  >  3/2. 


V 


The  third  part  of  the  initialization  consists  of  a  scaling  of  magnitudes. 


'02 


-  < 


I 


3A   if   1  <  R01  <  5A 
5/8   if   5A  <  R01  <  3/2 
1/2   if   3/2  <  R01  <  2 


where,  from  the  previous  operation, 


R01  6  [X,  2) 


I   e  [-1/2,  1/U). 


Thus,  by  direct  computation, 


R0  €  [3A,  1] 


l0  e  [-3/8,  +3/16)  e  [-3A,  +  3A]. 

-(M+l) 

Therefore,  by  the  argument  of  Case  1,  the  error  is  again  bound  by  2 

Example :   If  X  =  1.2,  then  tan"  X  =  O.87605805059819.  The  algorithm  produces 

-12 
an  approximation  which  is  in  error  by  less  than  0.2  x  10   ,  within  the  error 

bound  of  0.1+5  x  10"12. 
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11. k     Experimental  estimate  of  speed 

Because  of  the  sequence  of  steps  in  the  initialization,  the  effi- 
ciency of  the  algorithm  is  somewhat  less  than  desired,  although  the  time 
required  to  compute  tan   X,  0  <  X  <  oo,  is  still  only  slightly  greater  (about 
three  addition  cycle  times),  than  the  time  required  to  perform  a  division. 
For  the  steps  beyond  initialization,  the  mean  probability  of  a  zero  is  O.65O, 
with  a  corresponding  shift  average  of  2.87. 

11.5  Implement at  i  on 

As  has  been  the  usual  practice,  a  simple  transformation,  u,  =  2  I  , 
is  made  in  order  that  the  comparison  constant  may  remain  ±  3/8>  rather  than 
±3/8  *  2   .   Introducing  this  transformation  into  the  recursion  relations 
obtained  earlier  yields  the  following. 

Vi  -  \  -  <•*  t-"V(k+1))  m->o 

V  ■  2\  +  SA  C11"6)  ■ 

It  is  of  interest  to  note  that,  since  |u|  <  1,  IL    remains  fixed  during  the 
last  half  of  the  algorithm. 

Figure  11  shows  a  block  diagram  to  implement  these  recursions. 

11.6  Concluding  remark 

Since  the  inverse  cosine  and  inverse  sine  algorithms  employ  a  version 
of  the  inverse  tangent  algorithm,  it  is  important  that  this  latter  algorithm  be 
as  fast  as  possible—it  is  nearly  as  fast  as  division. 
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12.  A  NOTE  ON  EVALUATION  OF  ARCCOSINE/ARCSINE 

Previously  developed  algorithms  suffice  to  evaluate  arccosine  or 
arcsine  for  X  e  [0,  1],  although  a  modification  of  the  initialization  is 
required.  From  the  identities 


sin'1  X  =  tan"1     -7=  (l2-l) 


r77?. 


-1    „  .  -1     f  W    l-X2  1  ,_c     _s 

cos        X  =   tan        1 - (12-2) 


sin'1  X  +  cos"1  X  =  «/2  (12-3) 

2 

it  may  be  seen  that  a  single  multiplication  to  form  X  ,    followed  by  an  appli- 


-\  2 

cation  of  a  square  root  algorithm  to  form  V  1  -  X  ,  followed  by  the  arc- 
tangent algorithm  with  a  special  initialization  procedure  to  avoid  the 
indicated  division  may  be  used  to  evaluate  arc cosine/arc sine.  The  initiali- 
zation is  simplified  by  considering  two  cases. 
Case 


>e  1:   If  Vi-x2  >  x,  (0  <  X  <  1/^2),  then  set 


R 


00 


=  ~^1-X2  e  [1/V2,  1] 


Im  =  X  e  [0,  1/V2] 


■00 

-1  -1 

and  use  (12-1)  to  compute  sin   X  and  (12-3)  to  compute  cos  '  X  if  desired. 

The  next  part  of  the  initialization  consists  of  choosing  t  _  =  1  -  j  l/2  so 

that 


RQ1  -~Vl-X2  +  1/2  X 


IQ1  =  X  -  1/2  V  l-X2. 
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Finally  choose  t   =  3  A  "to  scale  magnitudes  to  desirable  ranges. 

Ro  =  3/1+  R01  €  [3/U>  1] 

Z0  =  3/k   I01  €  ["3/8>  +3/8]  e  [~3/U'  +3AL 

Continuation  of  the  arctangent  algorithm  then  provides  the  desired  result 
■with  an  error  less  than  2 


Case  2:   If  V 1-X2  <  X,   (l/-^2  <  X  <  l),  then  set 

RQ0  =  X  €  (1/V2",  1] 


I00=  Vl-X2  e  [0,1/ Y2) 

and  continue  as  in  Case  1. 

It  may  thus  be  concluded  that  arccosine/arcsine  may  be  evaluated  in 
slightly  more  than  three  multiplication  cycle  times. 


13.   ON  A  HIGHER  RADIX  IMPLEMENTATION 

13.1  General  considerations 

One  of  the  chief  limitations  on  the  speed  with  which  the  algorithms 
developed  in  this  research  may  be  implemented  is  the  step-by-step  transposition 
in  the  control:  s  is  chosen,  then  the  appropriate  recursions  performed,  s  .. 
is  chosen,  the  recursions  performed,  and  so  on.   If  the  control  time  is  at 
all  significant,  there  are  clear  speed  advantages  in  choosing  not  only  s 
but  also  s,  ,   ...  from  comparisons  on  11  and  then  performing  several  (binary) 
steps  in  the  recursion  relations.  The  cost  of  this  higher  radix  implementa- 
tion is  a  significant  complication  of  the  required  comparisons  to  choose 
s  n,  ....   In  general,  both  an  increase  in  the  number  of  comparisons 
required  and  an  increase  in  the  precision  of  the  comparisons  would  be  expected. 

Although  higher  radix  implementations  are  not  a  subject  of  detailed 
study  in  this  research,  a  few  apparently  important  considerations  are  known 
and  it  is  desirable  to  discuss  them  here.   Only  radices  which  are  integer 
powers  of  two  are  considered. 

To  illustrate  the  general  strategy,  let  us  compare  the  recursion  for 
the  division  scheme  of  Section  3  necessary  to  choose  s  ,  s   ,  ...  with  the 
analogous  recursion  for  most  other  division  schemes. 

Vi  -  2\ +  \ +  2~\\  (13-L) 

Vi  =  2\"  V  (13"2) 

The  symbols  in  (13-I)  have  been  previously  defined;  in  (13-2),  s  is  the  k 
quotient  digit,  vl  and  il    are  partial  remainders  (u_  is  the  dividend),  and 
x  is  the  divisor.   In  either  (13-I)  or  (13-2)  the  range  of  il    is  the  same 
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as  the  range  of  vl   .  Except  for  the  first  few  steps,  the  term  2  s  xl    in 
(13-1)  contains  information  of  only  secondary  importance,  that  is,  the 
dominant  digits  of  il    are  determined  only  by  2il  and  s   (recall 
s  =  (1,  0,  1}).  Thus,  from  the  value  2u.  alone,  one  is  able  to  choose  not 
only  s  ,  but  also  s   ,  ....  From  (13-2)  it  is  seen  that,  independent  of 
the  step  index  k,  significant  information  is  contained  in  the  term  s,  x,  and 
schemes  for  higher  radix  implementation  must  take  this  effect  into  account, 
that  is,  the  recoding  is  a  function  of  the  divisor.   (Scaling  procedures 
have  been  studied  to  overcome  this  difficulty  in  the  division  represented  by 
(13-2)).   To  facilitate  the  implementation  of  a  higher  radix  procedure,  one 
would  prefer  to  have  a  recursion  which  does  not  explicitly  depend  on  the 
original  operand — except  for  the  starting  difficulty  incurred,  recursion 
(l3-l)  is  considerably  less  complicated  to  implement  in  radix  r  =  2  ,   n  >  1, 
than  is  recursion  (13-2). 

Most,  but  not  all,  of  the  algorithms  developed  in  earlier  sections 
of  this  paper  lend  themselves  to  such  a  simplified  higher  radix  implementation. 

13.2  Amenability  of  normalization  algorithms  to  higher  radix 

Once  the  first  few  steps  have  been  performed,  the  division  scheme 
proposed  in  Section  3  lends  itself  well  to  a  higher  radix  implementation. 
The  multiplication  algorithm  of  Section  h   requires  the  simple  recursion 

and  a  higher  radix  implementation  is  visibly  easier  for  multiplication  than  for 
division  since  no  starting  problem  exists  (beyond  the  actual  initialization). 

Two  square  root  algorithms  were  discussed  in  detail  in  Sections  6 
and  7J  the  recursions  of  present  interest  are  listed  below. 
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vi  ■  <2\ +  \> +  2"k<2VR +  ^  \) +  s's^s  4  V    (m-*) 
vi  ■  2\  -  */*-a +  4  2"(k+2))  (13-5) 

In  the  strictly  binary  case,  the  second  algorithm,  represented  by  recursion 
(13-5),  is  clearly  superior  since  it  is  less  complex  and  can  be  implemented 
with  less  time  and/or  hardware.  However,  it  has  the  disadvantage  of  being 
more  difficult  to  implement  in  higher  radix  because  the  recursion  is  a  strong 
function  of  R  ,  the  approximation  to  y  x.  Thus,  this  algorithm  presents  at 
least  the  same  level  of  difficulty  in  a  higher  radix  implementation  as  the 
division  represented  by  (13-2).  The  first  square  root  algorithm,  represented 
by  (l3-*0,  is  clearly  comparable  to  the  division  in  (13-I). 

It  is  perhaps  less  obvious  but  still  easy  to  show  that  the  algorithm 
for  exponential  is  readily  amenable  to  higher  radix. 

\+1  -  2\  "  ^W  +  2~(k+\> 

-  2^  -  2k+1[2-(k+l)sk  -  1/2  2-2<k+1>s2 

+  higher  order  terms ] 

=  (2u,  -s  )  +s  2       +  higher  order  terms        (13-6) 

Again,  after  some  difficulty  getting  started,  a  higher  radix  implementation 
may  be  feasible.  The  same  is  true  of  the  algorithms  for  tangent  and  cosine/ 
sine. 

_      _k+l   -1, _-(k+l)v 
uk+l  =  2\  "  sk2   tan  (2      ) 

=  2u^  -  sk2k+1[2"^k+1^  -  1/3  2"3(k+1)+  higher  order  terms] 
=  (2u,-s  )  +  1/3  s  2       +  higher  order  terms.    (l3"T) 


recursion. 
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For  the  arctangent  algorithm,  the  issue  is  less  clear.  From  the 

Vi  =  2\  +  3A  •    (13-8> 

it  appears  that  this  algorithm  is  even  more  difficult  to  implement  in  higher 

radix  than  the  division  of  (13-2)  since  R  is  not  only  a  function  of  the 

given  operand,  but  a  function  of  the  step  index  as  well.  However,  it  was 
shown  in  Section  11  that 

Rk  =  K  +  g(k) 

where  K  £  [3/^>  l)  is  a  function  only  of  the  original  operand  and  g(k)  is 

a  very  slowly  changing  function  of  the  step  index  and  is  a  second-order  effect 

beyond  the  first  few  steps.  Hence,  (13-8),  rewritten  as, 

-ul    =  2u,  +  s  K  +  higher  order  terms  (13_9) 

appears  to  present  the  same  level  of  difficulty  in  higher  radix  as  does 
(13-2). 

From  these  general  preliminary  considerations,  all  of  the  algorithms 
except  the  second  square  root  and  the  arctangent  appear  to  be  readily  amenable 
to  higher  radix  implementation. 

13.3  A  change  in  strategy 

One  may  view  a  higher  radix  implementation,  say  radix  k,    as  a 
simple  alteration  of  control  in  that  two  successive  values  s  ,  s    are 
chosen  on  the  basis  of  il  alone  without  forming  xl       .  Even  though  the 
probability  that  s,  =  0  may  approach  2/3,  the  probability  of  a  radix  h   digit, 

K 

represented  by  s  s    ,    being  zero  is  quite  small.   Furthermore,  the  digit 
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sequences  11  and  11,  representing  values  3  and  3,  are  to  be  avoided  since 
two  additions  (or  subtractions)  are  required.   In  a  radix  h   implementation, 
one  may  wish  to  limit  redundancy  by  allowing  only  digital  values  2,  1,  0,  1,  2. 

Since  the  probability  of  a  zero  in  radix  h   is  small  (about  l/U  if 
values  2,  1,  0,  1,  2  are  allowed)  and  speed  is  achieved  by  reducing  control 
time,  the  shift  average  is  no  longer  a  meaningful  measure  of  efficiency. 
Rather  the  speed  itself  or  a  speed  to  hardware  ratio  must  be  extablished. 

While  no  efficiency  studies  of  radix  k   implementations  of  these 
algorithms  have  been  made,  a  few  thoughts  in  that  direction  are  appropriate. 

13.^  Efficiency  of  radix  k   versus  radix  2 

For  as  specific  a  comparison  as  can  be  made  at  this  time,  let  us 
consider  the  multiplication  algorithm  of  Section  k   which  presents  no  starting 
problems  for  a  higher  radix  implementation.  The  time  required  to  perform 
multiplication  radix  2  is  given  by 

T2  S  M[tc  +  1/3  tA  +  tg  +  1/3  tSH] 

where 

t   =  control  sequencing  time 

t   =  addition/subtraction  time  (including  complementation) 

-H. 

t„  =  selection  time --time  required  to  select  a  value  for  s. 
S  k 

t   =  shifting  time. 
orl 

The  time  required  to  perform  multiplication  radix  h   is  given  by 

Tu  S  M/2[a  tc  +  3A  tA  +  B  tg  +  h/5   tgH] 

where  a  is  a  measure  of  the  complication  of  the  control  and  B  is  a  measure  of 
the  complication  in  the  selection  rules  caused  by  choosing  two  "multipliers" 


138 

at  once;  probably  a  ~  1,  f3  ~  2.  It  has  been  assumed  throughout  this  work 
that  complementations,  low  precision  comparisons,  and  shifting  operations 
are  much  faster  than  addition.  Thus,  / 

T2  3  M  [tc  +  1/3  tA] 


T^  ~  M  [1/2  tc  +  3/8  tA]. 


It  thus  appears  that  if  the  control  time  is  at  all  appreciable  relative  to 
the  addition  time,  a  radix  k   implementation  would  be  faster  than  radix  2: 
T2  >  1k        if   tc  >  1/12  tA. 

Similar  considerations,  but  not  necessarily  similar  conclusions, 
appear  to  apply  to  any  radix  r  =  2  ,   n  >  2. 


Ik .   CONCLUSIONS 

lU.l  A  set  of  algorithms 

A  class  of  algorithms  for  evaluation  of  certain  elementary  functions, 
suitable  for  hard-wire  implementation  in  a  scientific  binary  digital  computer, 
has  been  developed  and  studied.   It  has  been  shown  that  all  of  the  algorithms 
can  be  implemented  with  a  reasonably  economical  structure,  and  it  is  strongly 
believed  that  these  algorithms  would  be  considerably  faster  than  software 
routines  that  are  now  often  used.  The  algorithms  employ  minimal  redundancy 
to  achieve  an  increase  in  speed  over  non-redundant  versions  of  the  same  al- 
gorithms; the  cost  of  employing  redundancy  is  a  complication  of  the  initial- 
ization of  the  algorithms  and  an  increase  in  the  precision  of  the  comparisons 
required  to  choose  multiplier  constants. 

The  extension  of  the  algorithms  to  higher  radix,  restricted  to  the 

case  where  the  radix  is  a  power  of  two,  has  been  briefly  studied  with  the 

observation  that  the  recursion  upon  which  sn  ,  sn  n,  ...  are  based  should  be 

*  k7   k+17 

independent  of  the  initial  operand.  With  the  exception  of  the  arctangent, 
algorithms  are  known  for  every  function  in  the  set  that  have  this  favorable 
property.   In  contrast,  the  recursion  commonly  used  for  division, 


Vi  = r  \  -  \ x 

where  q  is  the  k   radix  r  quotient  digit,  u  and  vl         are  partial  remain- 
ders (u  is  the  dividend),  and  x  is  the  divisor,  does  not  have  this  favorable 
property. 

The  application  of  redundant  arithmetic  recodings  has  been  extended 
to  a  limited  set  of  functions.   It  is  possible  that  much  broader  classes  of 
functions  may  lend  themselves  to  approaches  similar  to  those  employed  here. 
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lU.2  Areas  of  further  investigation 
1^ . 2 . 1  Generalization  of  the  technique 

The  generalization  and  extension  of  the  "normalization"  technique 
presents,  probably,  the  most  interesting  topic  for  further  investigation. 
It  is  known  that  either  a  continued  product  cr  a  continued  summation  repre- 
sentation of  a  divisor  or  its  reciprocal  yields  two  possible  division 
algorithms.   Similarly  it  is  known  that  either  representation  leads  to  a  pair 
of  algorithms  for  square  root.   Surprisingly,  only  a  single  such  algorithm 
for  multiplication  appears  to  offer  any  practicality.  Only  a  single  algorithm 
is  known  for  each  of  the  other  functions  studied. 

The  success  in  devising  the  algorithms  discussed  in  this  report  is 
due  basically  to  three  useful  properties: 

(1)  log  (H  p.)  =  Z  log  p. 

(2)  exp  (E  SjL)  =  IT(exp  s±) 

(3)  exp  (log  x)  =  log  (exp  x)  =  x. 

It  is  not  known  what  general  class  of  functions  may  be  evaluated  through  such 
a  formulation,  or  whether  that  class  has  already  been  exhausted  in  this  paper. 
The  hyperbolic  functions  seem  likely  candidates,  but  were  not  studied  here 
because  they  can  quite  easily  be  formed  in  terms  of  exponentials. 

lU.2.2  Correspondence  between  normalization  recodings  and  classical  multiplier 
recodings 

Considerable  attention  has  been  devoted  to  the  study  of  multiplier 

recodings  and  the  correspondence  between  certain  digital  division  techniques 

1  2 

and  these  recodings.  '   It  is  clear  that  a  similar,  but  perhaps  less  direct, 

correspondence  exists  between  the  recodings  of  these  algorithms  and  the 
multiplier  recodings.  A  complete  analysis  of  this  correspondence  would  be 
quite  valuable. 


Ik . 2 . 3  Some  partial  insight? 

Perhaps  some  insight  into  both  of  the  problems  discussed  above  is 
available  even  now. 

Given  a  number  in  conventional  form, 


00 

-1 


x  =  Z  s.2   ,      S.  €  {1,  0,  1}  (1^-1) 

i=0  X         X 

one  may  recode  as 

<*» 

x  =  Z  s!  2"1,    s!  e  (1,  0,  1}  (1^-2) 

1=0  x        x 

as  in  the  division  scheme.  However,  more  generally,  one  may  recode  as 

00 

x  =  Z  s!«v(i),   s!'  e  {1,  0,  1}  (1^-3) 

i=0  x  x 

where  w(i)  represents  a  "weighting  function"  not  necessarily  equal  to  its 
"nominal"  value  of  2   .  Note  that  this  is  no  longer  a  strictly  radix  2 
recoding,  although  it  is  assumed  that  w(i)  is  on  the  order  of  2   .   In 
particular,  it  is  known  that 

s.w(i)  =  s.  tan"1(2_1)  (ik-k) 

and 

s.w(i)  =  £n(l  +  si2_1)  (1^-5) 

have  practical  application.   In  attempting  to  determine  what  class  of  functions 
may  be  evaluated  by  a  normalization  algorithm,  or  in  studying  the  properties 
of  the  resulting  recodings,  it  is  natural  to  seek  properties  of  weight  func- 
tions that  produce  algebraically  correct  recodings.   (it  is  not  to  be  inferred 
that  merely  because  a  set  of  weights  produces  an  algebraically,  correct  recoding 
the  set  has  any  practical  application,  but  it  is  assumed  to  be  a  necessary 
prerequisite.) 


ordered: 


lk2 

It  is  convenient,  first  of  all,  that  the  weights  be  positive  and  be 

0  <  w(i+l)  <  w(i).  (lh-6) 

It  would  appear  at  first  glance  that  another  convenient  property  would  be 
that  the  ratio  of  successive  weights 

w(i+l) 
v(i) 

should  be  constant,  but  some  of  those  weights  found  to  have  practical  appli- 
cation do  not  satisfy  this  property.   If  digital  values  are  limited  to  the 
set  {1,  0,  1},  then  one  would  certainly  like  to  have  the  property  that  the 
k   weight  satisfies 

00 

w(k)  <   £  w(i).  (lh-1) 

i=k+l 

The  special  class  of  weights 

w(i)  <p_i,   1<  p  <  3  (lk-8) 

may  be  seen  to  be  acceptable,  but  this  class  does  not  include  those  sets  of 
weights  found  to  have  practical  application.  A  second-order  perturbation 
solves  this  problem: 

w(i)  =  p"1  -  f(i),   1  <  p  <  3  (1^-9) 

where 

|f(i)|  «P_1 

/  \*  -i 

w(i)  <3  . 


*  With  only  three  digital  values,  one  cannot  recode  in  a  radix  greater  than 
three. 


1^3 

With  this  perturbation,  the  weights  represented  in  equations  (lU-lj-)  and  (lU-5) 
fall  into  this  class.  The  classes  of  weights  represented  by  (1^-8)  and  (lU-9) 
also  suggest  the  property  that 

4^1  €  [1/3,1). 

•  Certainly  no  answers  to  the  problems  discussed  above  have  been 
given,  but  perhaps  an  insight  into  a  direction  of  study  has  been  given. 

Ik . 2 . k     A  different  radix  h   approach 

In  Section  13  it  was  suggested  that  one  method  of  generating  a 

radix  k   implementation  was  to  simply  choose  two  successive  binary  digital 

values  sn  ,  sn  _  at  once,  probably  limiting  the  radix  k   digital  value  s,   ,  , 
k   k+1        '  k,  k+1 

to  one  of  the  set  {2,  1,  0,  1,  2} .  A  natural  additional  step  would  be  to 

change  the  weight  functions  to  be  of  the  form  k       or  tan  (h     )  or 

#n(l  +  s.   .  ^h     ).     Such  a  modification  would  eliminate  roughly  half  of  the 
1,  l+l  to 

stored  constants. 

1^ . 2 . 5  Some  practical  matters 

As  practical  matters,  development  of  actual  hardware  control 
circuitry  (or  perhaps  equivalent  microprogram  control)  and  an  "optimization" 
of  initialization  steps  remain  for  anyone  seriously  interested  in  implementing 
these  algorithms  in  a  machine. 
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APPENDIX  A:   MONTE  CARLO  ESTIMATE  OF  THE  PROBABILITY  OF  A  ZERO 

A.l  Confidence  in  estimate  of  probability  of  a  zero 

Although  the  theoretical  asymptotic  value  of  the  probability  of  a 
zero  is  known,  one  would,  as  a  practical  matter,  have  to  have  an  estimate 
for  a  register  of  finite  length;  in  particular,  a  register  length  of  Uo  was 
chosen.  ■ 

Following  the  discussion  given  by  Cochran,  let  p  be  the  estimat 
probability  of  a  zero  and  P  be  the  actual  probability.  Further,  let  d  be 
the  margin  of  error  in  p  ,  and  let  OC   be  the  risk  that  the  actual  error  is 
larger  than  d. 

a  =   Prob(|p0  -  PQ|  >  d) 

If  p  is  taken  as  normally  distributed,  then  the  standard  deviation  a       is 
given  by 


I  po^1"po^ 

where  N  is  the  population  size  (N  =  2   for  a  register  length  of  Uo)  and  n 

1  ft 
the  sample  size  (n  =  2  ).  The  formula  that  connects  n  with  the  degree  of 

precision  is 


where  t  is  the  abscissa  of  the  normal  curve  that  cuts  off  an  area  Q;  at  the 
tails;  solving  for  n, 

a2 

n  = 


[ffo^fol 

■  +  N  [2 ■ 


Ikk 


1*5 

For  practical  use,  the  estimate  pQ  must  be  used.   Since  n  <  <  N, 

t2p0(l-P0)    Pq(1-Pq) 

n  =  — -j =  — 

d 

2  2 
where  V  =  d  /t  =  variance  of  the  sample.  Thus,  the  sample  variance  is 


v  ,  Pq^-Pq)  ,  (2/3K1/3)  =  q.8,8  x  10-6 

n         ^lo 


and  the   sample   standard  deviation  is 

~\Jv~=   0.920  x  10"3. 

With  99^  confidence,  one  may  say  that  the  actual  deviation  is  within  about 
2.58  ~\JY~   2.37  x  10~3;  with  99.9$  confidence,  one  may  say  that  the  actual 
deviation  is  within  about  3.29  V^~-  3-03  x  10~3.  Thus,  for  a  sample  of  2  , 
P  is  known  accurately  enough  for  the  purposes  of  this  research. 

A. 2  Generation  of  pseudo-random  double  precision  operands 

The  IBM  360/75  computer  system  software  package  at  the  University 
of  Illinois  contains  a  single  precision  pseudo-random  number  generator 
employing  a  version  of  Lehmer's  method.  The  subroutine  generates  pseudo- 
random numbers  using  the  recurrence  relations 

a.       =  2  % 
J+1      J 

PJ+1  =  51TPJ(mod  2k2),    PQ  =  1 

with  a  period  of  approximately  2   .   Since  a  sample  size  of  2   was  chosen, 
each  of  the  random  operands  in  the  sample  is  unique.  Two  single  precision 
operands  are  required  to  provide  each  double  precision  operand. 


11+6 

2 
As  described  by  Schreider,   the  chi-square  and  Kolmogorov  tests  of 

uniformity  of  random  numbers  are  the  most  commonly  accepted  criteria  for 

accepting  or  rejecting  a  set  of  pseudo-random  numbers.  A  sample  of  ten  sets 

10 
of  2   double  precision  pseudo-random  numbers  passed  both  of  these  tests. 

A. 2.1  The  chi-square  test 

The.  chi-square  test  is  based  on  the  statistic 


2 


N  ( v . -np . ) 


2 


y   i  i 

1  =  1  *1 

where  V.  is  the  sample  number  of  objects  in  the  i   interval,  np  is  the 
l  *± 

expected  value  of  v.,  and  N  is  the  number  of  intervals  chosen.  The  values 
N  =  10,  n  =  2  ,  pj[  =  1/10,  i  =  1,  2,...,  10  were  used.  Thus, 

2   10  (V.-102.U)2 
X   =i=l    1°2.U    • 

The  hypothesis  of  uniform  distribution  of  the  pseudo-random  numbers  is 

2  2 

rejected  if  %     exceeds  the  upper  limit  X^  -,  /  \   °^   the  confidence  interval, 

where  p  is  the  assigned  confidence  probability  and  N  -  1  is  the  number  of 

2 

degrees  of  freedom.   In  this  test,  p  =  0.95  was  chosen,  so  that  values  of  X 

should  not  exceed  xQ/  Qt-\  =  l6.9. 

2 
Extremely  small  values  of  X  are  considered  an  indication  of  failure 

of  randomness.   The  critical  region  of  acceptance  of  the  hypothesis  is  taken 

to  be 


[%-l(l-p)'  %-l(P)]  =  [3*33'  l6'9]- 


*  A  confidence  level  p  means  a  probability  close  to  one  such  that,  if  the 
hypothesis  of  uniform  distribution  is  correct,  the  probability  is  p  that 
the  value  obtained  for  X<=  will  not  exceed  X^(p). 


ik-j 

2 
In  the  case  of  p  =  0.95,  "the  probability  that  X  lies  outside  the 

above  interval  is  far  from  negligible,  about  one  trial  in  ten.  Ten  samples 

2 
were  tested;  the  values  of  x  for  the  test  samples  are  listed  below.  Since 

only  one  trial  falls  outside  the  critical  region,  the  hypothesis  is  accepted, 


TABLE 

IT 

Trial 

£ 

1 

6.90 

2 

7.80 

3 

5.36 

k 

11.  in 

5 

k.ld 

6 

J+.22 

7 

13.21 

8 

13.62 

9 

21.1*3 

10 

9.09 

A. 2. 2  The  Kolmogorov  Criterion 

Kolmogorov's  criterion  is  based  on  the  statistic 


m 
x 


D  =  maxl—  -  F(x) 
n      '  n    v  ' 


where  n  =  2   is  the  size  of  the  sample,  m  is  the  number  of  objects  in  the 

-A. 

sample  not  exceeding  the  value  x,  and  F(x)  is  the  theoretical  cumulative 
probability  function. 


IhQ 


Schreider  suggests  rejecting  the  hypothesis  of  "uniformity"  if 

1/2 
n   Dn  falls  outside  the  range  (0.5,  1.5).  The  table  below  lists  the  values 


1/2 

of  n  '  D  for  ten  test  trials. 


n 


TABLE 

18 

Trial 

n  '  D 
n 

1 

0.537 

2 

0.7UU 

3 

0.319 

k 

O.613 

5 

O.638 

6 

0.519 

7 

0.713 

8 

O.825 

9 

1.119 

10 

0.956 

1/2 

Since  for  nine  of  ten  trials  the  value  of  n  '  D  falls  in  the  desired  range, 

n  D  ' 

the  hypothesis  is  accepted.   It  may  be  noted  qualitatively,  however,  that 
the  data  indicate  that  the  numbers  generated  tend  to  be  somewhat  more  nearly 
"uniform  than  would  be  expected  of  purely  random  numbers. 


14-9 

A. 2. 3  Listing  of  raw  data  for  statistical  tests 


TABLE 

19 

X 

* 
m 

X 

1 

2 

-2_ 

4 

_i_ 

6 

7 

8 

9 

10 

0.1 

■  109 

105 

100 

92 

108 

88 

101 

128 

100 

97 

0.2 

218 

199 

195 

191 

211 

191 

200 

227 

185 

203 

0.3 

319 

301 

299 

297 

313 

291 

308 

330 

307 

307 

0.4 

422 

401 

4l8 

4l8 

430 

393 

401 

436 

422 

419 

0.5 

529 

504 

516 

520 

529 

499 

506 

531 

508 

538 

0.6 

618 

605 

617 

634 

620 

607 

611 

618 

593 

645 

0.7 

709 

693 

708 

721 

722 

711 

715 

715 

681 

735 

0.8 

802 

815 

809 

812 

819 

821 

842 

812 

802 

836 

0.9 

918 

927 

915 

927 

924 

915 

920 

931 

908 

940 

1.0     1024   1024   1024   1024   1024   1024   1024   1024   1024   1024 
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*  Ten  trials  for  m  are  listed. 
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APPENDIX  B:   DISCUSSION  OF  THE  CHOICE  OF  MULTIPLIER 
CONSTANTS  IN  THE  COSINE/SINE  ALGORITHM 


The  cosine/sine  algorithm  .employs  the  identity 


e   =  e 


M 
j(x  -  Z  q>  ) 

i=0 


M 


n  • 

1=0 

t. 

i 

M 

n 

l*«l 

i=0 


where   {cp.}    must  be  chosen  such  that 

M 
x  -  Z  cp.    =0. 
i=0  X 


Since  the  cosine  and  sine  are  required,  one  must  evaluate  T  = 
where 


Rk  +  J  h 


K     =     Re  \    1 

k      \  k-1 


fk-1 

nt 

i=0  x 


it  t. 
i-o  x'  / 


k-l 

n  t, 


Ik  =  Im  \£T 


n  t. 

i=0  x 


Given  T  ,  one  must  compute  the  real  and  imaginary  parts  of  T   .  Let  us 


k+1" 


consider  the  following. 


(l)     Write 


k       t 


T  =  IT     t—^- 

k+1     . \  ItT 
1=0    '    i 


=  IT  (cos  cp.    +  j   sin  cp. ). 

i=0  X  1 


150 


151 
Then, 


Tk+1  =  (\   COS  ^k  "  Ik  Sin  ^  +  ^\   COS  ^k  +  Rk  Sin  %J* 

Since  it  is  not  possible,  in  general,  that  both  cos  q>  and  sin  qi  have  only- 
one  non-zero  bit,  this  is  not  an  efficient  recursion  to  perform. 


(2)     Write 


T.    .   =  n  (t.   X.) 
k+1     i=0     X     x 


where  X.    = 


1  t.    ' 


Then, 


\+i  ■  (\  rk  -  h  \K +  ith  \ +  Rk  \K 


i- 


2  2 

where  tn    =  rn    +  j   i,  .      Since  X,  ~\/r,    +  i,    =  1.    it   is  not  possible,    in 
kkk  k    V    k         k         ' 


general,   that  rn  ,    i,  ,   and  X,    each  have  only  one  non-zero  bit.     Hence,   the 
7  k       k'  k  ' 

recursion  cannot  be  performed  efficiently  in  this  formulation. 
(3)     As  a  variant  of  the  above,    let  us   choose 

rk  =  1 

k         k 


so  that 


xk  = 


V 


-  S2  2-2<k+1> 


2 
Then  the  necessary  recursions  can  be  performed  if  s  =  1  and  IT  X.  is 

precomputed.  That  for  k  sufficiently  large,  a  simple  approximation  for 

X,  suffices  has  been  shown  in  Section  10. 
k 


APPENDIX  C:  LISTING  OF  FRECOMPUTED  CONSTANTS 

All  precomputed  constants  required  by  the  algorithms  presented  in 
this  paper  are  listed  below;  sufficient  accuracy  for  a  mantissa  of  length  i+0 
is  retained. 


TABLE  20 

Special  Constant 

* 

Arroroximate  Value 

jt 

3.1U1592653589T9 

n/2 

1.57079632679^90 

n/k 

0.785398163397^5 

Tt/8 

O.39269908169872 

I/Y2" 

O.707IO678II8655 

K1 

O.88706U 17837978 

K'  » 

0.367U3^01338025 

tan  it/8 

O.U1U21356237310 

*  Note  that  the  values  of  KT  and  K1 '  are  functions  of  M. 

152 


153 


TABLE  21 


-l 


3.2 


-l 


1 

0.50000000000000 

1.50000000000000 

2 

0.25000000000000 

O.75OOOOOOOOOOOO 

3 

0.12500000000000 

O.375OOOOOOOOOOO 

k 

0.06250000000000 

O.I875OOOOOOOOOO 

5 

0.03125000000000 

O.O9375OOOOOOOOO 

6 

0.01562500000000 

0. 01+687500000000 

7 

0.00781250000000 

O.O23I+375OOOOOOO 

8 

0.00390625000000 

0.01171875000000 

9 

0.00195312500000 

O.OO5859375OOOOO 

10 

0.00097656250000 

0.00292968750000 

n 

0.0001+8828125000 

0.0011+61+81+375000 

12 

0.00021+1+ 11+ 062500 

0.0007321+2187500 

13 

0.00012207031250 

0.00036621093750 

11+ 

0.00006103515625 

O.OOOI83IO5 1+6875 

15 

0.00003051757813 

0. 0000915  52731+38 

16 

0.00001525878906 

O.OOOOI+577636719 

IT 

0.000007629391+53 

0.00002288818359 

18 

0.000003811+69726 

0.0000111+1+1+09180 

19 

0.00000190731+863 

0.00000572201+590 

20 

0.000000953671+32 

0.00000286102295 

21 

0.0000001+7683716 

O.OOOOOIU305III+7 

22 

0.000000238U1858 

0.0000007152557^ 

23 

0.00000011920929 

O.OOOOOO35762787 

2U 

0.000000059601+61+ 

0.00000017881393 

25 

0.00000002980232 

O.OOOOOOO89I+O697 

(Continued 

0 

15k 


TABLE   21  (Continued) 


-i 


2*2 


-i 


26 

O.OOOOOOOll+901l6 

0.0000000UU7031+8 

27 

O.000000007U5058 

0.0000000223 5 lib 

28 

O.OOOOOOOO372529 

0.00000001117587 

29 

O.OOOOOOOOI86265 

0. 000000005 5879U 

30 

O.OOOOOOOOO93132 

0.00000000279397 

31 

0.OOOOOOOOOU6566 

0.00000000139698 

32 

0.00000000023283 

0.000000000698U9 

33 

0.000000000116^2 

0.0000000003U925 

3^ 

0.00000000005821 

0.000000000171+62 

35 

0.00000000002910 

0.00000000008731 

36 

0.00000000001^55 

O.OOOOOOOOOOU366 

37 

0.00000000000728 

0.00000000002183 

38 

0.0000000000036U 

0.00000000001091 

39 

0.00000000000182 

0. 000000000005 U6 

Uo 

0.00000000000091 

0.00000000000273 

in 

0.000000000000U5 

0.00000000000136 

42 

0.00000000000023 

0.00000000000068 

^3 

0.00000000000011 

0. 0000000000003 k 

kh 

0.00000000000006 

0.00000000000017 

h5 

0.00000000000003 

0.00000000000009 
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TABLE   22 

i 

ln(l+2~1) 

ln(l-2_1) 

1 

O.I+O5I+65IO8IO816 

-0.6931U718055995 

2 

0.2231U355131U21 

-O.287682072U5178 

3 

0.11778303565638 

-O.I3353139262I+52 

k 

O.O6062U62I816U3 

-O.O6U5 3852113757 

5 

O.O3O77165866675 

-O.O317U869831U58 

6 

0.015501+18653597 

-O.OI57I+83569681U 

7 

O.OO77821U0UU2O5 

-O.OO78U3177U6IO3 

8 

O.OO38986U0I+I566 

-O.OO391389932III+ 

9 

0.00195122013126 

-O.OOI95503U8358O 

10 

O.OOO97608597306 

-O.OOO9770396I+783 

11 

O.OOOl+88l6207950 

-O.OOOU88I+OOU98H 

12 

0.00021+1+11082753 

-0.000241+170^3217 

13 

0.00012206286253 

-O.OOOI2207776369 

11+ 

0.00006103329368 

-O.OOOO6IO37OI897 

15 

0. 0000305 171121+7 

-O.OOOO305180I+38O 

16 

O.OOOOI525867265 

-O.OOOOI5258905I+8 

IT 

O.OOOOO7629365I+3 

-O.OOOOO7629I+236I+ 

18 

O.OOOOO381U68999 

-0.0000038ll+70l+5l+ 

19 

O.OOOOOI9O73I+68I 

- O.0000019073 5 Ok  5 

20 

O.OOOOOO95367386 

-O.OOOOOO95367I+77 

21 

O.OOOOOOI+768370I+ 

-0.000000I+7683727 

22 

O.OOOOOO238U1855 

-0.000000238I+1861 

23 

0.00000011920928 

-0.00000011920930 

2^ 

0.000000059601+64 

-0.00000005960I+65 

25 

0.00000002980232 

-0.00000002980232 

(Continued 

) 
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TABLE  22  (Continued) 


-l- 


lu(l+2  ) 


-l. 


ln(l-2  x) 


26 

0.000000011+90116 

-0.000000011+90116 

27 

O.OOOOOOOO7I+5058 

-0.0000000071+5058 

28 

0.00000000372529 

-0.00000000372529 

29 

O.OOOOOOOOI86265 

-0.00000000186265 

30 

O.OOOOOOOOO93132 

-0.00000000093132 

31 

0.OOOOOOOOOI+65  66 

-0.0000000001+6566 

32 

0.00000000023283 

-0.00000000023283 

33 

0.0000000001161+2 

-0.0000000001161+2 

3^ 

0.00000000005821 

-0.00000000005821 

35 

0.00000000002910 

-0.00000000002910 

36 

O.OOOOOOOOOOII+55 

-0.000000000011+55 

37 

0.00000000000728 

-0.00000000000728 

38 

O.OOOOOOOOOOO36I+ 

-0.0000000000036U 

39 

0.00000000000182 

-0.00000000000182 

1+0 

0.00000000000091 

-0.00000000000091 

kl 

0.0000000000001+5 

-O.OOOOOOCOOOOOl+5 

1+2 

0.00000000000023 

-0.00000000000023 

^3 

0.00000000000011 

-0.00000000000011 

1+1+ 

0.00000000000006 

-0.00000000000006 

^5 

0.00000000000003 

-0.00000000000003 
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TABLE  23 

i 

arctan  (2   ) 

1 

O.U636U7609OOO8I 

2 

0.24^97866312686 

3 

O.I2U35I+99U54676 

k 

O.O62U1880999596 

5 

0.03l2398331+3027 

6 

0.015623728620^8 

7 

0.0078123^106010 

8 

0.00390623013197 

9 

0.001953122516U8 

10 

O.OOO97656218956 

11 

0.000^8828121119 

12 

0.0002U1+1U062015 

13 

0.00012207031189 

Ik 

0.00006103515617 

15 

0.00003051757812 

16 

O.OOOOI525878906 

17 

O.OOOOO762939U53 

18 

0.0000038li+69726 

19 

0.0000019073^863 

20 

O.OOOOOO95367U32 

21 

O.OOOOOOU7683716 

22 

O.OOOOOO238U1858 

23 

0.00000011920929 

2k 

0.00000005960U6^ 

25 

0.00000002980232 

(Continued) 
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TABLE  23  (Continued) 


i 

arctan  (2  ) 

26 

O.OOOOOOOIU9OH6 

27 

O.OOOOOOOO7U5058 

28 

0.00000000372529 

29 

O.OOOOOOOOI86265 

30 

0.00000000093132 

31 

O.OOOOOOOOOU6566 

32 

O.OOOOOOOOO23283 

33 

0.000000000116U2 

3^ 

0.00000000005821 

35 

0.00000000002910 

36 

0.00000000001^55 

37 

0.00000000000728 

38 

O.OOOOOOOOOOO36U 

39 

0.00000000000182 

1+0 

0.00000000000091 

Ul 

0.000000000000^5 

U2 

0.00000000000023 

^3 

0.00000000000011 

41+ 

0.00000000000006 

^5 

0.00000000000003 

APPENDIX  D:     LISTING  OF  PARTIAL  SIMULATION  PROGRAM  CODE 


//  EXEC 

ASM 

//ASM.SYSIN  00 

* 

LML 

BEGIN 

LM 

1,2,0(1) 

I 

5*0(1) 

L 

7, MASK 

NR 

5,7 

L 

7, EX 

XR 

5,7 

* 

TM 

1(1),X'80» 

BO 

ONE 

TM 

1(1),X«40« 

BO 

TWO 

TM 

1(1),X«20« 

$ 

BO 

THREE 

A 

5, CTWO 

B 

STORE 

THREE 

A 

5, CONETH 

B 

STORE 

TWO 

A 

5, CONETW 

B 

STORE 

ONE 

A 

5, CONEON 

STORE 
* 

ST 

5,0(2) 

SR 

8,8 

ST 

8,4(2) 

LEAVE 

CNOP 

0,8 

MASK 

DC 

X'FFOOOOOO* 

EX 

DC 

X'TFOOOOOO1 

CTWO 

DC 

X,02800000' 

CONETH 

DC 

X'02400000« 

CONETW 

DC 

X«02200000« 

CONEON 

DC 

END 

X«02100000' 

/* 

//  EXEC 

ASM 

//ASM.SYSIN  DD 

* 

GEN 

BEGIN 

LM 

1,2,0(1) 

LR 

10,2    SAVE 

LR 

9,1 

L 

2,0(2) 

ST 

2, IN 

CALL 

RN3INZ,( IN) 

L 

6, END 

L 

7,C0MP 

L 

8, SET 

SR 

5,5 

BGD 

CALL 

IRN3Z 

NR 

0,7 

OR 

0,8 

ST 

0,0(9,5) 

CALL 

IRN3Z 

ST 

0,0(10)    S 

LOAD   ADDRESSES    OP    X    AND   Y 

LOAD   X    INTO   R5 

LOAD  MASK  INTO  R7 

SET  FRACTIONAL  PART  OP  X  TO  ZERO 

LOAD  EX  INTO  R7 

COMPLIMENT  CHARACTERISTIC  SETTING  SIGN 

TO  ZERO  ,IN  R5 

TEST  FRACTIONAL  PART 

GO  TO  •ONE'  IF  GREATER  THAN  0.5 

GO  TO  •TWO*  IP  GREATER  THAN  0.25 

GO  TO  'THREE'  IF  GREATER  THAN  0.125 
GOES  HERE  IF  LESS  THAN  0.125 
FORM  2**-E  IN  R5 


PLACE  2**-E  IN  STORAGE  LOCATION  ASSIGNED 
TO  Y 


LOAD  ADDRESSES  INTO  Rl  AND  R2 
LAST  RANDOM  NUMBER  -  NEW  IX 

LOAD  INITIAL  RANDOM  NUMBER  INTO  R2 

ENTER  INITIAL  RANDOM  NUMBER  INTO  RAN3Z 
LOAD  8192  INTO  R6 
LOAD  'AND*  MASK  INTO  R7 
LOAD  'OR'  MARK  INTO  R8 
SET  R5  TO  ZERO 

GENERATE  FIRST  HALF  OF  RANDOM  NUMBER 
SET  FIRST  8  BITS  TO  ZEROS 
SET  FIRST  8  BITS  TO  01000000... 
STORE  FIRST  HALF  OF  RANDOM  NUMBER 
GENERATE  SECOND  HALF  OF  RANDOM  NUMBER 
AVE  LAST  RANDOM  NUMBER  -  NEW  IX 
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END 

DEC 

COMP 

SET 

IN 

/* 

//  EX 
//FOR 
C 

C 


SLL 

ST 

A 

CLR 

BNE 

LEAVE 

DC 

DC 

DC 

DC 

OS 

END 


0*1 

0,4(9,5) 
5, DEC 
5,6 
BGD 

F«8192» 
F»8« 

X'OOFFFFFF1 
X« 40000000 • 
IF 


SHIFT  SECOND  HALF  1  §IT  LEFT 

STORE  SECOND  HALF  OF  RANDOM  NUMIIR 

AOD  •  TO  R5 

COMPARE  RS  TO  R* 

GO  TO  86D  IF  RS  NOT  EQUAL  A* 


(40,0) 


EC  FORTLKGO,TIME.GO« 
T.SYSIN  DD  * 
FORTRAN  PARTIAL  SIMULATION  CODE 
OPERAND  VALUE  IS  X 
IMPLICIT  REAL*8( A-H,L,0-Z,$) 

DIMENSION  TW0(100),TTW0(100),LNPLUS(60),LNMINS(60),ARCTAN(60) 
RAND (1024) 

,NMZER0(20),NMMINS(20) 

,NDZERO(20),NDMINS(20) 

,NEZER0(20),NEMINS(20) 

,NSZERO(20),NSMINS(20) 

,NTZER0(20),NTMINS(20) 

,NAZER0(20),NAMINS(20) 

,NCZER0(20),NCMINS(20) 

,N0ZERO(20),N0MINS(20) 


11 
12 
14 
15 
16 
18 
20 
21 
24 
25 
26 
27 
28 
29 
30 
36 
38 
40 
41 
42 
43 
44 
45 
46 
47 
48 
49 
51 


DIMENSION 
DIMENSION 
DIMENSION 
DIMENSION 
DIMENSION 
DIMENSION 
DIMENSION 
DIMENSION 
DIMENSION 
LISTING  OF 


NMPLUS(20) 
NDPLUS(20) 
NEPLUS(20) 
NSPLUS(20) 
NTPLUS(20) 
NAPLUS(20) 
NCPLUS(20) 
N0PLUS(20) 
FORMATS 


F0RMAKZ16) 

F0RMAT(I5,D28.16) 

F0RMAT(D28.16) 

FORMAT! •  THE  VALUES  NPLUS,NZERO,NMI NUS  •) 

FORMAT(3I15) 

F0RMAK2D28.16) 

FORMAT (1 15) 

FORMAT(«  THE  VALUE  OF  IX  TO  USE  NEXT  •) 

MULTIPLICATION:    PLUS, ZERO, MINUS  •) 
DIVISION/LOGARITHM:   PLUS, ZERO, MINUS  • 
EXPONENTIAL:   PLUS, ZERO, MINUS  •) 
SQUARE  ROOT:   PLUS,ZERO,MINUS  •) 
TANGENT/COTANGENT:   PLUS , ZERO, MINUS  •) 


FORMAT ( 

FORMAT( 

FORMAT ( 

FORMAT( 

FORMAT ( 

FORMAT 

FORMAT 


(•  ARCTANGENT: 
(•  COSINE/SINE 

F0RMAT(4I15) 

F0RMAT(3D28.16) 

F0RMAT(4D28.16) 


PLUS, ZERO, MINUS 
PLUS,ZERO,MINUS 


•  ) 
•  ) 


FORMAT 
FORMAT 
FORMAT 
FORMAT 
FORMAT 
FORMAT 
FORMAT 
FORMAT 
FORMAT 
FORMAT 


(• 
(• 
(  • 
(• 
(  • 
(  • 
(  • 
(• 
(  • 
f  • 


•  ) 
i 


LN2 

PI 

PI2 

PI4 

PI8 


TWO( I) 

TTWO(I) 

LNPLUS(I) 

LNMINS(I) 

ARCTAN(I) 

•  ) 
•) 

•  ) 
•) 

•  ) 


•  ) 

•  ) 

•  ) 
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69 

70 
71 
72 
73 
74 
75 
76 
77 


60 


61 


62 


63 


64 


FORMAT 

FORMAT 

FORMAT 

FORMAT 

FORMAT 

FORMAT 

FORMAT 

FORMAT 

FORMAT 

READ  C 

READ(5 

READ<5 

READ(5 

READ(5 

READ(5 

READ(5 

REA0(5 

READ(5 

READ(5 

READ(5 

WRITE 

DO  60 

WRITE 

WRITE 

DO  61 

WRITE 

WRITE 

DO  62 

WRITE 

WRITE 

DO  63 

WRITE 

WRITE 

DO  64 

WRITE 

WRITE 

WRITE 

WRITE 

WRITE 

WRITE 

WRITE 

WRITE 

WRITE 

WRITE 

WRITE 

DO  100 

NMPLUS 

NMZERO 

NMMINS 

NDPLUS 

NDZERO 

NDMINS 

NEPLUS 

NEZERO 

NEMINS 

NSPLUS 

NSZERO 


(•  THE  PROBABILITY  OF  A  ZERO 
«•    MULTIPLICATION    •) 
DIVISION    •) 
EXPONENTIAL    •) 
FIRST  SQUARE  ROOT    ») 
TANGENT/COTANGENT    • ) 
ARCTANGENT    ') 
COSINE/SINE    •) 
(  •    SECOND  SQUARE  ROOT    •] 
ONSTANTS 


tl 
♦  1 
tl 
tl 
tl 
tl 
tl 
tl 
tl 
tl 
(6 
I 

(6 
(6 
I 

(6 
(6 
I 

(6 
(6 
I 

(6 
(6 
I 

(6 
(6 
(6 
(6 
(6 
(6 
(6 
(6 
(6 
(6 
(6 
J 
(J 
(J 
(J 
(J 
(J 
(J 
(J 
(J 
(J 
(J 
(J 


(TWO( I) ,1=1,60) 
(TTW0U),I  =  l,60) 
(LNPLUS( I), 1=1, 50) 
(LNMINSt I), 1=1, 50) 
(ARCTAN( I), 1=1, 50) 
LN2 
PI 
PI2 
P14 
PI8 
41) 

1,60 
12)  ItTWOU) 
42) 

1,60 
12)  I,TTWO(I) 
43) 

1,50 
12)  ItLNPLUS(I) 
44) 

1,50 
12)  I,LNMINS(I) 
45) 

1,50 
12)  ItARCTAN(I) 
46) 
14) 
47) 
14) 
48) 
14) 
49) 
14) 
51) 

14)  PI8 
=  ltl6 
=  0 
=  0 
=  0 
=  0 
=  0 
=  0 
=  0 
=  0 
=  0 
=  0 
=  0 


LN2 


PI 


PI2 


PI4 


100 


201 
202 
203 
204 
250 

255 
256 
257 
258 
300 


309 
310 
311 


162 
NSMINS(J) 

NTPLUS(J) 

NTZERO(J) 

NTMINS(J) 

NAPLUS(J) 

NAZEROU) 

NAMINS(J) 

NCPLUS(J) 

NCZERO(J) 

NCMINS(J) 

NOPLUS(J) 

NQZERO(J) 

NQMINS(J) 

CONTINUE 

IX  =  7867 

DO  99200 

00  99999 

DO  99000 

WRITE<6,2 

WRITE<6,2 

Z  =  RAND( 

IF(Z.GE.O 

IFiZ.GE.O 

IF(Z.LT.O 

IF(Z.LT.O 

IF(Z.LT.O 

IF(Z.LT.O 

J  =  1 

GO  TO 

J  =  2 

GO  TO 

J  =  3 

GO  TO 

J  =  4 

GO  TO 

IFiZ.LT.O 

IF(Z.LT.O 

IF(Z.LT.O 

IF(Z.LT.O 

J  =  5 

GO  TO 

J  =  6 

GO  TO 

J  =  7 

GO  TO 

J  =  8 

GO  TO 

IF(Z.GE.O 

IF(Z.LT.O 

IF(Z.LT.O 

IFIZ.LT.O 

IF(Z.LT.O 

J  =  9 

GO  TO  500 

J  =  10 

GO  TO  500 

J  =  11 


0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 
0 


1353 

NNNN  ■  It  16 

MX  =  1,2 

MMM  =  1,8 

1) 

0)  IX 

KKK) 

.500)  GO  TO  300 

.25)  GO  TO  250 

.0625)  GO  TO  201 

.125)  GO  TO  202 

.1875)  GO  TO  203 

.25)  GO  TO  204 


500 


500 


500 


500 


.3125)  GO  TO  255 
.375)  GO  TO  256 
.4375)  GO  TO  257 
.500)  GO  TO  258 


500 


500 


500 


500 


.750)  GO  TO  350 
.5625)  GO  TO  309 
.625)  GO  TO  310 
.6875)  GO  TO  311 
.75)  GO  TO  312 
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60  TO  500 

312  J  -  12 

GO  TO  500 

350  IF1Z.LT. 0.8175)  GO  TO  363 
IF(Z.LT. 0.875)  GO  TO  364 
IFtZ.LT. 0.9375)  GO  TO  365 
IFU.LE. 1.000)  GO  TO  366 

363  J  *  13 

GO  TO  500 

364  J  «  14 

GO  TO  500 

365  J  *  15 

GO  TO  500 

366  J  =  16 
500  CONTINUE 

C      RANDOM  NUMBER  GENERATOR 

CALL  GEN(RAND,IX) 
C      MULTIPLICATION  ALGORITHM 

X  =  0.5*RAND(KKK)  +  0.5 

NMINUS  =  0 

NZERO  =  0 

NPLUS  =  0 
C      INITIALIZATION 

IF  (X.LT.0.75)  GO  TO  1150 

X  =  X  -  1.0 

GO  TO  1160 
1150  X  =  X  -  0.5 
1160  CONTINUE 

NZERO  =  NZERO  ♦  1 

DO  1400  I  =  lt40 

CCM  =  -  TTW0( 1+2) 

IF  (X.GE.CCM)  GO  TO  1200 

X  =  X  +  TWOd  ) 

NPLUS  =  NPLUS  +  1 

GO  TO  1400 
1200  CCP  =  -»-TTWO(  1+2) 

IF  (X.GE.CCP)  GO  TO  1300 

NZERO  =  NZERO  +  1 

GO  TO  1400 
1300  X  =  X  -  TW0( I ) 

NMINUS  =  NMINUS  +  1 
1400  CONTINUE 

NMPLUS(J)  =  NMPLUS(J)  +  NPLUS 

NMZERO(J)  =  NMZERO(J)  +  NZERO 

NMMINS(J)  -  NMMINS(J)  +  NMINUS 
C      DIVISION  ALGORITHM 

X  =  0.5  *  RAND(KKK)  +  0.5 

NMINUS  =  0 

NZERO  =  0 

NPLUS  =  0 
C      INITIALIZATION 

IFU.LT.0.75)  GO  TO  1450 

GO  TO  1460 
1450  X  =  X  *  2.000 
1460  NZERO  »  NZERO  ♦  1 

DO  2000  I  *  1,40 

CCM  *  1.0  -  TTW0U+2) 


16k 

IF(X.GT.CCM)  GO  TO  1500 

X  ■  X  *  (LO  +  TWOUJ) 
NPLUS  »  NPLUS  ♦  1 
GO  TO  2000 
1500  CCP  ■  1.0  +  TTWOU+2) 

IF(X.GT.CCP)  GO  TO  1700 
NZERO  ■  NZERO  ♦  1 
GO  TO  2000 

i7oo  x  ■  x  *  d.o  -  Twonn 

NMINUS  «  NMINUS  ♦  1 
2000  CONTINUE 

NDPLUS(J)  ■  NDPLUS(J)  ♦  NPLUS 

NOZERO(J)  >  NDZERO(J)  ♦  NZERO 

NDMINS(J)  *  NDMINS(J)  ♦  NMINUS 
:      EXPONENTIAL  ALGORITHM 

X  *  (2.0  *  RAND(KKK)  -  1.0)  *  LN2 

NMINUS  =  0 

NZERO  =  0 

NPLUS  =  0 
:      INITIALIZATION 

IFU.LT.0.5)  GO  TO  3100 

X  =  X  -  0.5 

NPLUS  =  NPLUS  ♦  1 

GO  TO  3800 
3100  IF(X.LT.0.25)  GO  TO  3200 

X  =  X  -  0.25 

NPLUS  =  NPLUS  +  1 

GO  TO  3800 
3200  IFU.LT.-0.25)  GO  TO  3300 

NZERO  =  NZERO  ♦  1 

GO  TO  3800 
3300  IFU.LT.-0.5)  GO  TO  3400 

X  =  X  +  0.25 

NMINUS  =  NMINUS  +  1 

GO  TO  3800 
3400  X  =  X  +  0.5 

NMINUS  =  NMINUS  ♦  1 
3800  CONTINUE 

00  4000  I  =  1,40 

CCM  =  -  TTW0( 1+2) 

IF(X.GT.CCM)  GO  TO  3850 

X  =  X  -  LNMINS( I ) 

NMINUS  =  NMINUS  ♦  1 

GO  TO  4000 
3850  CCP  =  +  TTWO( 1+2) 

IF(X.GT.CCP)  GO  TO  3860 

NZERO  =  NZERO  ♦  1 

GO  TO  4000 
3860  X  =  X  -  LNPLUS( I ) 

NPLUS  =  NPLUS  +  1 
4000  CONTINUE 

NEPLUS(J)  =  NEPLUS(J)  ♦  NPLUS 

NEZEROU)  =  NEZERO(J)  +  NZERO 

NEMINS(J)  =  NEMINS(J)  ♦  NMINUS 
;      SQUARE  ROOT  ALGORITHM:  RADIX  2»SCALED 

X  =  0.75  *  RAND(KKK)  ♦  0.25 

XO  *  X 
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NMINUS  «  0 

NZERO  *  0 

NPLUS  «  0 
:      TEST  INITIAL  RANGE  OF  X 

IFU.GT. 0.3125)  GO  TO  4100 

CO  -  0*500 

R  -  CO 

RSQ  ■  R  *  R 

X  «  XO  -  RSQ 

NPLUS  ■  NPLUS  +  1 

00  4050  I  •  1*40 

CCM  *  -  TTWOU+3) 

IFU.GT. CCM)  GO  TO  4010 

R  »  R  -  TWOU  +  l) 

RSQ  =  R  *  R 

X  =  XO  -  RSQ 

NMINUS  =  NMINUS  +  1 

GO  TO  4050 
4010  CCP  =  +  TTWO(  1+3) 

IF(X.GE.CCP)  GO  TO  4020 

NZERO  =  NZERO  +  1 

GO  TO  4050 
4020  R  =  R  +  TWOU  +  l) 

RSQ  =  R  *  R 

X  =  XO  -  RSQ 

NPLUS  =  NPLUS  +  1 
4050  CONTINUE 

GO  TO  5000 
4100  IFU.GT. 0.375)  GO  TO  4200 

CO  =  0.5625 

R  =  CO 

RSO  =  R  *  R 

X  =  XO  -  RSQ 

NPLUS  =  NPLUS  +  1 

DO  4150  I  =  1*40 

CCM  =  -  TWOU+2)  -  TTWOU+4) 

IFU.GT. CCM)  GO  TO  4110 

R  =  R  -  TWO( I  +  l) 

RSQ  =  R  *  R 

X  =  XO  -  RSQ 

NMINUS  =  NMINUS  +  1 

GO  TO  4150 
4110  CCP  =  -  CCM 

IF(X.GE.CCP)  GO  TO  4120 

NZERO  =  NZERO  ♦  1 

GO  TO  4150 
4120  R  =  R  +  TWO( I+l) 

RSQ  =  R  *  R 

X  =  XO  -  RSQ 

NPLUS  =  NPLUS  +  1 
4150  CONTINUE 

GO  TO  5000 
4200  IFU.GT. 0.5625)  GO  TO  4300 

CO  =  0.625 

R  =  CO 

RSQ  =  R  *  R 

X  =  XO  -  RSQ 
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NPLUS  ■  NPLUS  ♦  1 

DO  4250  I  ■  lt40 

CCM  ■  -  TWOU  +  l) 

IF(X.GT.CCM)  GO  TO  4210 

R  ■  R  -  TWO( 1*1) 

RSO  ■  p.  ♦  R 

X  «  XO  -  RSO 

NHINUS  ■  NMINUS  ♦  1 

GO  TO  4250 
4210  CCP  ■  -  CCM 

IF(X.GE.CCP)  GO  TO  4220 

NZERO  -  NZERO  ♦  1 

GO  TO  4250 
4220  R  -  R  ♦  TWOU  +  1) 

RSO  *  R  *  R 

X  ■  XO  -  RSO 

NPLUS  «  NPLUS  ♦  1 
4250  CONTINUE 

GO  TO  5000 
4300  IFU.GT. 0.875)  GO  TO  4400 

CO  =  0,75 

R  -    CO 

RSQ  =  R  *  R 

X  s  XO  -  RSO 

NPLUS  =  NPLUS  +  1 

DO  4350  I  =  lf40 

CCM  =  -  TWOU  +  1)  -  TWOU+3) 

IF(X.GT.CCM)  GO  TO  4310 

R  =  R  -  TWO( 1+1) 

RSO  =  R  *  R 

X  =  XO  -  RSO 

NMINUS  =  NMINUS  +  1 

GO  TO  4350 
4310  CCP  =  -  CCM 

IF(X.GE.CCP)  GO  TO  4320 

NZERO  =  NZERO  ♦  1 

GO  TO  4350 
4320  R  =  R  +  TWO( 1+1) 

RSO  =  R  *  R 

X  =  XO  -  RSO 

NPLUS  =  NPLUS  ♦  1 
4350  CONTINUE 

GO  TO  5000 
4400  CO  =  1.0 

R  =  CO 

RSO  =  R  *  R 

X  =  XO  -  RSQ 

NPLUS  =  NPLUS  +  1 

DO  4450  I  =  lf40 

CCM  =  -  TTWO( 1+2) 

IF(X.GT.CCM)  GO  TO  4410 

R  =  R  -  TWOd  +  1) 

RSQ  =  R  *  R 

X  =  XO  -  RSQ 

NMINUS  =  NMINUS  ♦  1 

GO  TO  4450 
4410  CCP  =  -  CCM 
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IF(X.GE.CCP)  GO  TO  4420 

NZERO  -  N2ER0  *    1 

GO  TO  4450 
4420  R  ■  R  ♦  TWO(H-l) 

RSO  «  R  *  R 

X  ■  XO  -  RSO 

NPLUS  ■  NPLUS  ♦  1 
4450  CONTINUE 
5000  CONTINUE 

NSPLUS(J)  -  NSPLUS(J)  ♦  NPLUS 

NSZERO(J)  ■  NSZERO(J)  ♦  NZERO 

NSMINS(J)  -  NSMINS(J)  ♦  NMINUS 
:      TANGENT  ALGORITHM 

X  =  RAND(KKK)  *  PI2 

Rl  *  0.96887271238293440-04 

XH  *  X 
;      TEST  RANGE  OF  OPERAND 

IF(X.LE.Rl)  GO  TO  5100 

IFU.LE.PI4)  GO  TO  5200 

C  =  PI2  -  TW0(9) 

IFU.LT.C)  GO  TO  5300 

D  =  PI2  -  TW0140) 

IF(X.LT.D)  GO  TO  5400 

INOIC  =  5 

TANXH  =  1.0/(PI2  -  XH) 

GO  TO  8100 
5100  INDIC  =  1 

TANXH  =  XH 

GO  TO  8100 
5200  INOIC  =  2 

X  =  XH 

GO  TO  6100 
5300  INDIC  =  3 

X  =  PI2  -  XH 

GO  TO  6100 
5400  INDIC  =  4 

X  =  PI2  -  XH 
6100  NPLUS  =  0 

NZERO  =  0 

NMINUS  =  0 
;      INITIALIZATION 

A  =  1.0 

B  =  0.414213562373095 

X  =  X  -  PI8 

NPLUS  =  NPLUS  ♦  1 
;      BEGIN  NORMAL  ALGORITHM  ITERATION 

IF( INDIC. EO. 4)  GO  TO  6110 

K  =  40 

GO  TO  6120 
6110  K  =  50 
6120  CONTINUE 

DO  7100  I  =  1,K 

CCM  =  -  TTWO( 1+2) 

IF(X.GE.CCM)  GO  TO  6200 

X  =  X  ♦  ARCTAN( I  ) 

SAVEA  =  A 

A  =  A  +  B  *  TWO( I) 
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B  «  B  -  SAVEA  *  TWO(I) 
NMINUS  «  NHINUS  «•  1 
GO  TO  6600 
6200  CCP  ■  -  CCM 

IF(X.GT.CCP)  GO  TO  6300 
NZERO  «  NZERO  +  1 
GO  TO  6600 
6300  X  •  X  -  ARCTANU) 
A 

B  *    TWO(I) 
SAVEA  *  TWOU) 
NPLUS  ♦  1 


6600 
7100 


X  *  X 

SAVEA 

A  »  A 

B  *  B 

NPLUS 

CONTINUE 

CONTINUE 

NTPLUSU) 

NTZERO(J) 

NTMINS(J) 

IF( INDIC.E0.2) 

IFUNDIC.E0.3) 


8100 


8300 


8400 


8510 


NTPLUS( J) 
NTZERO(J) 
NTMINS(J) 
TANXH 
TANXH 
TANXH 


♦  NPLUS 

♦  NZERO 

♦  NMINUS 
B/A 
A/B 
A/B 


CASES 
GO  TO 

8800 


TO  OMIT 
8800 


IF( INDIC.EQ.4) 
CONTINUE 

ARCTANGENT  ALGORITHM 
TAKE  RESULT  OF  TANGENT 
X  =  TANXH 
NMINUS  =  0 
NZERO  =  0" 
NPLUS  =  0 
TEST  FOR  SPECIAL 
IF(X.LT.TW0(20) ) 
T  =  1.0/TWOUO) 
IF(X.GT.T)  GO  TO 
D  =  DABS(X-l.O) 
IF(D.LT.TW0(19) )  GO  TO  8800 
IF(X.GE.l.O)  GO  TO  8601 
HERE  X  IS  LESS  THAN  UNITY 
A  =  X  +  1.0 
B  =  X  -  1.0 

IFU.GT.0.25)  GO  TO  8300 
A  =  1.0 
B  =  X 

NPLUS  =  NPLUS  ♦  1 
GO  TO  8510 
IFU.GT.0.5)  GO  TO 
SAVEA  =  A 
A  =  0.5  *  A  - 
B  =  0.5  *  B  + 
NPLUS  =  NPLUS 
8510 

*  0.5 

*  0.5 
=  NZERO 


ALGORITHM 


ALGORITHM 


8400 


0.25 
0.25 
+  1 


B 
SAVEA 


GO  TO 
A  =  A 
B  =  B 
NZERO 


+  1 


CONTINUE 

BEGIN  NORMAL  ALGORITHM  ITERATION  WITH  R 

DO  8600  I  =  1»40 

CCM  =  -  TTWOU  +  2) 

IF(B.LE.CCM)  GO  TO  8520 

CCP  =  -  CCM 


40 


169 

IF(B.GT.CCP)  GO  TO  8530 

NZERO  *  NZERO  ♦  l 

GO  TO  8540 
8520  SAVEA  *  A 

A  ■  A .  -  B  *  TWO(I) 

B  »  B  +  SAVEA  *  TWOU) 

NMINUS  ■  NMINUS  ♦  1 

GO  TO  8540 
8530  SAVEA  «  A 

A  *  A  ♦  B  *  TWO(I) 

B  *  B  -  SAVEA  *  TWOU) 

NPLUS  =  NPLUS  ♦  1 
8540  CONTINUE 

IFU.EQ.40)  GO  TO  8700 

8600  CONTINUE 

8601  CONTINUE 

C      HERE  X  IS  AT  LEAST  UNITY 

C      FIND  EXPONENT  Y  OF  OPERAND  X 

CALL  LML(X,Y) 
C      LML  RETURNS  Y  =  2**(-E) 

P  =  Y  *  X 

A  =  P  +  Y 

B  =  P  -  Y 

IF  (X.LT.1.5)  GO  TO  8605 

SAVEA  =  A 

A  =  A  +  B 

B  =  B  -  SAVEA 

NMINUS  =  NMINUS  +  1 
8605  NZERO  =  NZERO  +  1 

IFU.GT.1.25)  GO  TO  8610 

A  =  A  *  0.75 

B  =  B  *  0.75 

NPLUS  =  NPLUS  +  1 

GO  TO  8670 
8610  IFtA.GT.1.5)  GO  TO  8620 

A  =  A  *  0.62  5 

B  =  B  *  0.625 

NPLUS  =  NPLUS  +  1 

GO  TO  8670 
8620  A  =  A  *  0.5 

B  =  B  *  0.5 

NZERO  =  NZERO  +  1 
8670  CONTINUE 
C      BEGIN  NORMAL  ALGORITHM  ITERATION  WITH  R  =  40 

DO  8700  I  =  1,40 

CCM  =  -  TTWOU+2) 

IF(B.LE.CCM)  GO  TO  8680 

CCP  =  -  CCM 

IF(B.GT.CCP)  GO  TO  8690 

NZERO  =  NZERO  +  1 

GO  TO  8700 
8680  SAVEA  =  A 

A  =  A  -  B  *  TWO( I ) 

B  =  B  +  SAVEA  *  TWO( I) 

NMINUS  =  NMINUS  +  1 

GO  TO  8700 
8690  SAVEA  =  A 


8700 


8800 
8900 


23000 

24000 
C 


26000 

26100 

27000 

27100 
29000 


NAPLUS(J) 
NAZEROU) 

NAMINS(J) 


ALGORITHM 
*  PI4 


NPLUS 
NZERO 

NMINUS 
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A  =  A  +  B  *  TWO(I) 

B  «  B  -  SAVEA  *  TWOU> 

NPLUS  ■  NPLUS  +  1 

CONTINUE 

NAPLUS(J)  ■ 

NAZERO(J)  * 

NAMINS(J)  ■ 

GO  TO  8900 

CONTINUE 

CONTINUE 

COSINE/SINE 

X  ■  RAND(KKK) 

NMINUS  =  0 

NZERO  =  0 

NPLUS  *    0 

INITIALIZATION 

X  =  X  -  PI8 
NPLUS  =  NPLUS  ♦  1 
NZERO  =  NZERO  +  1 

FIRST  QUARTER  NON-REDUNDANT 

DO  24000  I  =  2,9 

IF(X.GT.O)  GO  TO  23000 

X  =  X  +  ARCTAN( I ) 

NPLUS  =  NPLUS  ♦  1 

GO  TO  24000 

X  =  X  -  ARCTANU  ) 

NMINUS  =  NMINUS  +  1 

CONTINUE 

REST  REDUNDANT 

DO  29000  I  =  10,40 

CCP  =  TTWOd+2) 

IF(X.GT.CCP)  GO  TO  26000 

CCM  =  -  CCP 

IF(X.LE.CCM)  GO  TO  27000 

NZERO  =  NZERO  +  1 

GO  TO  29000 

X  =  X  -  ARCTAN( I ) 

IFU.LE.20)  NPLUS  =  NPLUS  +  1 

NPLUS  =  NPLUS  +  1 

GO  TO  29000 

X  =  X  +  ARCTAN( I ) 

IFd.LE.20)  NMINUS  =  NMINUS  +  1 

NMINUS  =  NMINUS  ♦  I 

CONTINUE 

NCPLUS( J) 
NCZERO( J) 
NCMINS( J) 
ALGORITHM 
61tl00 


NCPLUS(J)  = 
NCZERO(J)  = 
NCMINS(J)  = 
SQUARE  ROOT 
DO  30000  I  = 


+  NPLUS 
♦  NZERO 
+  NMINUS 
HIGHER  RADIX, 


NOT  SCALED 


TWO( I )  =  0.0 
30000  TTW0( I )  =  0.0 

X  =  0.75  *  RAND(KKK) 

Y  =  X 

NMINUS  =  0 

NZERO  =  0 

NPLUS  =  0 

ALPHA  =  X 


*    0.25 
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C      INITIAL  STEP 

IF(X.GE.0.5)  GO  TO  30100 

X  =  X  *  4.0 

ALPHA  *  ALPHA  *  2.0 
30100  NZERO  ■  NZERO  ♦  1 

DO  39000  I  «  1,40 

CCP  -  1.0  ♦  TTWOU+2) 

IF(X.GE.CCP)  GO  TO  30200 

CCM  ■  1.0  -  TTWOU+2) 

IF(X.GT.CCM)  GO  TO  30300 

X  =  X  *  (1.0  ♦  TWO(I)  ♦  TWO(2*I*2)> 

ALPHA  =  ALPHA  *  (1.0  +  TWOU  +  1)) 

NPLUS  =  NPLUS  +  1 

GO  TO  30400 
30200  X  =  X  *  (1.0  -  TWO(I)  +  TWO( 2* 1+2 ) ) 

ALPHA  =  ALPHA  *  (1.0  -  TWOU  +  1)) 

NMINUS  =  NMINUS  +  1 

GO  TO  30400 

NZERO  =  NZERO  +  1 

CONTINUE 


30300 
30400 
39000 


NOPLUS(J) 
NOZERO( J) 
NOMINS( J) 


CONTINUE 

NOPLUS(J) 

NQZERO( J ) 

NOMINS(J) 
99000  CONTINUE 

WRITE(6,24) 

WRITE  (6,36) 

WRITE(6,25) 

WRITE(6,36) 

WRITE(6,26) 

WRITE(6,36) 

WRITE(6,27) 

WRITE(6,36) 

WRITE(6,28 ) 

WRITE(6,36) 

WRITE(6,29) 

WRITE(6,36) 

WRITE(6,30) 

WRITE (6,36) 

WRITE(6,27) 

WRITE (6,36) 
99999  CONTINUE 
C      CALCULATE  PROBABILITY 

MMULT  =  0 

MMULTZ  =  0 

MDIV  =  0 

MDIVZ  =  0 

MEXP  =  0 

MEXPZ  =  0 

MS0T1  =  0 

MSQT1Z  =  0 

MTAN  =  0 

MTANZ  =  0 

MATN  =  0 

MATNZ  =  0 

MCOS  =  0 

MCOSZ  =  0 


NPLUS 
NZERO 
NMINUS 


( I,NMPLUS( I),NMZERO(I ),NMMINS( 
I,NDPLUS( I) ,NDZERO(I ) ,NDMINS(I 
I,NEPLUS( I ) ,NEZERO( I) ,NEMINS( I 
I,NSPLUS( I ) ,NSZERO(I ) ,NSMINS( I 
I, NT  PLUS ( I ) ,NTZERO(I ) ,NTMINS(I 
I,NAPLUS( I ) ,NAZERO(I ) ,NAMINS( I 
I,NCPLUS( I ) ,NCZERO( I ) ,NCMINS( I 
I,NQPLUS( I ) ,NOZERO( I) ,NOMINS( I 


),I=1,16) 


=1,16) 


=1,16) 


1,16) 


=1,16) 


=1,16) 


=1,16) 


=1,16) 


OF  A  ZERO  FROM  THESE  DATA 
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MSQT2  ■  0 
MSQT2Z  ■  0 
00  99100  I  -  ltl6 

MMULT  »  MMULT  ♦  NMPLUSU)  ♦  NMMINS(I) 
MHULTZ  *  MMULTZ  ♦  NMZERO(I) 
MDIV  *  MDIV  ♦  NDPLUS(I)  ♦  NDMINS(I) 
MDIVZ  ■  MDIVZ  +  NDZERO(I) 
HEXP  >  MEXP  ♦  NEPLUSU)  ♦  NEMINS(I) 
MEXPZ  -  HEXPZ  ♦  NEZEROU) 
MS0T1  «  MSQT1  ♦  NSPLUS(I)  ♦  NSKINSU) 
MSQT1Z  «  MS0T1Z  +  NSZEROU) 
MTAN  »  MTAN  ♦  NTPLUSU)  ♦  NTMINSU) 
MTANZ  =  MTANZ  ♦  NTZEROU) 
MATN  *  MATN  +  NAPLUSU)  ♦  NAMINS(I) 
MATNZ  =  MATNZ  ♦  NAZEROU) 
MCOS  =  MCOS  ♦  NCPLUS(I)  ♦  NCMINSU) 
MCOSZ  =  MCOSZ  ♦  NCZEROU) 
MS0T2  =  MS0T2  ♦  NQPLUSU)  ♦  NQMINS(I) 
MSQT2Z  =  MS0T2Z  ♦  NOZERO(I) 
99100  CONTINUE 

AMULT  =  MMULT  ♦  MMULTZ 

AMULT  =  NMULTZ 

PMULT  =  AMULTZ/AMULT 

WRITE  (6,78) 

WRITE  (6,14)  PMULT 

DIV  =  MOIV  +  MDIVZ 

DIVZ  =  MDIVZ 

PDIV  =  DIVZ/DIV 

EXP  =  MEXP  +  MEXPZ 

EXPZ  =  MEXPZ 

PEXP  =  EXPZ/EXP 

S0T1  =  MS0T1  ♦  MS0T1Z 

SQT1Z  =  MS0T1Z 

PS0T1  =  S0T1Z/SQT1 

TAN  =  MTAN  +  MTANZ 

TANZ  =  MTANZ 

PTAN  =  TANZ/TAN 

ATN  =  MATN  ♦  MATNZ 

ATNZ  =  MATNZ 

PATN  =  ATNZ/ATN 

COS  =  MCOS  +  MCOSZ 

COSZ  =  MCOSZ 

PCOS  =  COSZ/COS 

SQT2  =  MSQT2  «■  MSQT2Z 

SQT2Z  =  MS0T2Z 

PS0T2  =  S0T2Z/SQT2 

WRITE(6,69) 

WRITE  (6,71) 

WRITE(6,14)  PMULT 

WRITE  (6,14)  PDIV 

WRITE  (6,72) 

WRITE  (6,14)  PEXP 

WRITE  (6,73) 

WRITE  (6,14)  PSQT1 

WRITE  (6,74) 

WRITE  (6,14)  PTAN 

WRITE  (6,75) 
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WRITE  (6,14)  PATN 

WRITE  (6,76) 

WRITE  (6,14)  PCOS 

WRITE  (6,77) 

WRITE  (6,14)  PSOT2 
99200  CONTINUE 

WRITE(6,21) 

WRITE(6,20)  IX 
C      CONSTANTS  IN  IBM  360  HEXADECIMAL  FORMAT 

STOP 

END 
//GO.SYSIN  OD  * 
SENTRY 

4080000000000000  I 

4040000000000000  2 

4020000000000000  3 

4010000000000000  4 

3F80000000000000  5 

3F40000000000000  6 

3F20000000000000  7 

3F10000000000000  8 

3E80000000000000  9 

3E40000000000000  10 

3E20000000000000  11 

3E10000000000000  12 

3D80000000000000  13 

3D40000000000000  14 

3D20000000000000  15 

3D10000000000000  16 

3C80000000000000  17 

3C40000000000000  18 

3C20000000000000  19 

3C10000000000000  20 

3B80000000000000  21 

3B40000000000000  22 

3B20000000000000  23 

3B10000000000000  24 

3A80000000000000  25 

3A40000000000000  26 

3A20000000000000  27 

3A10000000000000  28 

3980000000000000  29 

3940000000000000  30 

3920000000000000  31 

3910000000000000  32 

3880000000000000  33 

3840000000000000  34 

3820000000000000  35 

3810000000000000  36 

3780000000000000  37 

3740000000000000  38 

3720000000000000  39 

3710000000000000  40 

3680000000000000  41 

3640000000000000  42 

3620000000000000  43 

3610000000000000  44 


3580000000000000  4f 

3540000000000000  ♦* 

3520000000000000  4? 

3510000000000000  *• 

3480000000000000  41 

3440000000000000  10 

3420000000000000  II 

3410000000000000  ft 

3380000000000000  It 

3340000000000000  I* 

3320000000000000  II 

3310000000000000  I* 

3280000000000000  IT 

3240000000000000  II 

3220000000000000  II 

3210000000000000  60 

4118000000000000  1 

40C0000000000000  2 

4060000000000000  I 

4030000000000000  4 

4018000000000000  I 

3FC0000000000000  6 

3F60000000000000  7 

3F30000000000000  8 

3F18000000000000  9 

3ECO0O0OOOOO000O  10 

3E60000000000000  11 

3E30000000000000  12 

3E18000000000000  13 

3DC0000000000000  1* 

3D60000000000000  1* 

3030000000000000  1* 

3D18000000000000  I7 

3CCOOOOOOO0OO00O  II 

3C60000000000000  I9 

3C30000000000000  20 

3C18000000000000  21 

3BC0000000000000  22 

3B60000000000000  23 

3B30000000000000  24 

3B18000000000000  25 

3ACO0OOOOOOOOO0O  26 

3A60000000000000  27 

3A30000000000000  28 

3A18000000000000  29 

39C0000000000000  30 

3960000000000000  31 

3930000000000000  32 

3918000000000000  33 

38C0000000000000  34 

3860000000000000  35 

3830000000000000  36 

3818000000000000  37 

37COOOOOOOOOOOOO  38 

3760000000000000  39 

3730000000000000  40 

3718000000000000  41 
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36C0000000000000 
3660000000000000 
3630000000000000 
3616000000000000 
35C0000000000000 
3560000000000000 
3530000000000000 
3518000000000000 
34CO000OO00OOO00 
3460000000000000 
3430000000000000 
3418000000000000 
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